nf-core / scrnaseq

A single-cell RNAseq pipeline for 10X genomics data
https://nf-co.re/scrnaseq
MIT License
216 stars 172 forks source link

Update or remove universc #289

Open grst opened 10 months ago

grst commented 10 months ago

Description of feature

Universc is currently broken (it likely was for a while, because tests were running with the -stub option).

Prerequisite to make it work would be to finalize the module update (https://github.com/nf-core/modules/pull/3493).

Personally, I am unsure how much value it adds as kallisto/alevin-fry/starsolo nowadays support quite flexible read structure definitions. I am not using universc and don't have the capacity to work on this - @TomKellyGenetics @adamrtalbot are you still interested in maintaining this functionality?

adamrtalbot commented 10 months ago

I agree. Let's get rid of it 👍

TomKellyGenetics commented 10 months ago

I'm quite busy at the moment but should be able to handle it. It has a very similar CLI to Cell Ranger so if that is already supported, it wouldn't be too much trouble to migrate changes over.

adamrtalbot commented 10 months ago

The CLI might be the same but as I remember it was fundamentally broken. The logs have disappeared on this run where I tried to fix it so I can't see why: https://github.com/nf-core/modules/actions/runs/6402528495/job/17379314958

Either it should be fixed or removed.

grst commented 10 months ago

Also the cellranger module has diverged since that: It's using cellranger 7, vs. universc is version 2/3, and the cellranger module uses a python script including file renaming to call cellranger. The synergies are rather limited at this point.

TomKellyGenetics commented 10 months ago

Renaming won't be necessary for UniverSC, it checks for compatible inputs and renames before calling Cell Ranger already (i.e., the functionality of the Python script is already integrated into it). The input arguments are the same for scRNA-Seq except for the additional "technology" parameter. I had tests for both passing locally and on GitHub Actions on the same test data at the time of merging. For more details, see the testing of the module before creating the subworkflow.

It should be trivial to fix the call on the same test data. I suspect the changes are to the test job or reference index (note the same STAR version is required for indexing and alignment in either case). Given there is an open-source implementation of this, it is unclear to me why efforts have been made to maintain a licensed proprietary version instead.

adamrtalbot commented 10 months ago

Given there is an open-source implementation of this, it is unclear to me why efforts have been made to maintain a licensed proprietary version instead.

The licensing for cellranger is bizarre to say the least. cellranger 2 is abandonware with an accessible license, cellranger 3 onwards is closed source. But at version 7.2.0 they released all code to Github but with a non-open source license. It's pretty clear we can't repackage v7.2.0 so we're stuck with v2.

You may not use, propagate or modify the Software, or any derivatives thereof, except as expressly provided herein. Any attempt otherwise to use, propagate or modify it is void, and will automatically terminate your rights under this license.

Although, that is not a software license I've ever seen before.

I too think we shouldn't be maintaining a cellranger module, but I would go further and remove anything without an open source license 😝 but I know how popular cellranger is.

grst commented 9 months ago

I agree the cellranger license is pathetic, but we have the permission to distribute it as part of this pipeline and it's still a popular tool (e.g. I work in pharma and they prefer to use it because then they can blame 10x if something is wrong). And for the multiomics assays, the open source tools haven't caught up yet.

But if you want to use an open source aligner, why not go for starsolo/alevin/kallisto which all support flexible chemistry definitions now and are much faster than cellranger 2? I.e. what exactly is the niche for universc?

I'm happy to keep it if someone makes it work again and adds a nf-test testcase, but I don't have the bandwidth to do it.

TomKellyGenetics commented 9 months ago

Although, that is not a software license I've ever seen before.

Yeah it’s their own custom license, which their general counsel kindly reminded us of when they became aware of our open-source implementation based off an MIT licensed repository (of version 3 which was then hastily updated).

I'm happy to keep it if someone makes it work again and adds a nf-test testcase

Thanks for taking it into consideration. The main motivation is to provide the same functionality as Cell Ranger without the constraints of the license as both the EULA and hardcoded parameters do not allow other kits. I believe the interface we provide is a convenient way to interact with it.

I’m willing to set up a test job and resubmit it as a PR but I’ve moved onto another job and we have deadlines for the end of the fiscal year (on other projects using different technologies). I’d like to revisit this when I have more time for older projects.

For context, the subworkflow was not heavily tested and was merged quickly as a drop-in replacement for Cell Ranger. However, the nf-core “module” that it calls was extensively tested and discussed in a separate PR. There was some overlap in reviewers which may explain why they assumed the new module (they’d already reviewed) would function as claimed. Unfortunately as a result, I’m unsure if the test job was originally flawed or if changes to the pipeline since have rendered it incompatible. Apologies for the inconvenience but I will attempt to rectify this when I have more time.