nf-core / demultiplex

Demultiplexing pipeline for sequencing data
https://nf-co.re/demultiplex
MIT License
44 stars 38 forks source link

Add UniverSC to support other technologies #14

Open TomKellyGenetics opened 3 years ago

TomKellyGenetics commented 3 years ago

We've recently released an open-source tool to expand the functionality of Cell Ranger to apply to other technologies. We provide a Docker container on an open-source license (see #2) which can be used to run Cell Ranger on other technologies without violating the 10X Genomics EULA. Note that this only applies to scRNA-Seq techniques at this stage.

GitHub repo: https://github.com/minoda-lab/universc Docker container: https://hub.docker.com/r/tomkellygenetics/universc Manuscript: https://www.biorxiv.org/content/10.1101/2021.01.19.427209v1

I'm willing to prepare a PR to call UniverSC in place of Cell Ranger is there is interest in doing this.

matthdsm commented 2 years ago

Hi,

If you're still interested, would you be willing to add your tool as a module to nf-core/modules?

Matthias

TomKellyGenetics commented 2 years ago

Sorry for the late response, we've just had a holiday period here in Japan so I have been busy with family.

I think that is a great idea. We are in the final stages of preparing a revised manuscript so I think the tool would be published soon. A nextflow module to accompany that would be ideal.

Where do I start?

matthdsm commented 2 years ago

Wow, Japan! Talk about a global network 😄 !

I see you already found your way on your own! You can create a new module by following this tutorial. Don't hesitate to ping me for a review of your PR!

TomKellyGenetics commented 2 years ago

Thank you, I've created a new branch here: https://github.com/TomKellyGenetics/nf-core-modules/tree/universc

I'll submit a new PR to the modules repo when I have completed it. The template is very helpful.

While we are still working towards publishing UniverSC, I have started a new position so my time is limited on top of family commitments. Sorry it may take time to get back to this but I am still interested in doing it.

matthdsm commented 2 years ago

Great. Take your time, we can include it whenever it's ready.

TomKellyGenetics commented 2 years ago

update: The module is passing unit tests and under review currently. https://github.com/nf-core/modules/pull/1706

I'd expect it will be ready to integrate into pipelines (e.g., nf-core/demultiplex or nf-core/scrnaseq) in the near future.

TomKellyGenetics commented 1 year ago

Update: the module running UniverSC v1.2.5.1 is stable so I think it is ready to proceed with developing a subworkflow to run it for scRNA-Seq as discussed with @apeltzer. https://github.com/nf-core/scrnaseq/issues/170 https://github.com/nf-core/scrnaseq/pull/185

I think similar code could used to run demultiplexing as well if it is beneficial to apply different implementations to each pipeline.

TomKellyGenetics commented 1 year ago

This tool has now been merged into nf-core/modules and can be used as a drop-in replacement for Cell Ranger with exclusively open-source dependencies. It also supports a "technology" parameter to configure the run for data generated from other platforms or protocols, in addition to 10X Genomics Chromium.

TomKellyGenetics commented 1 year ago

Is there a reason that "cellranger" (https://github.com/nf-core/demultiplex/pull/7) is no longer available? I noticed it is no longer documented and has been removed (https://github.com/nf-core/demultiplex/pull/28).

As previously described, this is an open-source alternative with similar functionality which is currently under consideration for scRNA-Seq (https://github.com/nf-core/scrnaseq/pull/185). If there are reasons I am not aware of for not including Cell Ranger in the current release, I will take these into consideration. If it is planned to add Cell Ranger or equivalent functionality after migrating to DSL2 I can support that. I am familiar with the Cell Ranger module (which supports the newer licensed version you used before) as well.

edmundmiller commented 1 year ago

Is there a reason that "cellranger" is no longer available?

Time 😛 We also haven't had anyone that wants to maintain, add it, or requested it since the v1.0.0. If you're volunteering, I'm happy to see it added! I think UniverSC sounds more appealing to me personally.

I also thought scrnaseq ran the demux, but I took a look at the pipeline, and it looks like they don't?

apeltzer commented 1 year ago

@Emiller88 No we don't in scrnaseq - scdemux is currently handled by most people using cellranger mkfastq, in case that @TomKellyGenetics wants to add universc, I'd be more than happy to let that go and not add the cellranger mkfastq to this workflow....

edmundmiller commented 1 year ago

Are the results from universc and cellranger the same? Is there any tradeoff between the two? We could also have both.

apeltzer commented 1 year ago

Something I'm leaving for @TomKellyGenetics to answer, I have not performed a deep dive yet

TomKellyGenetics commented 1 year ago

Time 😛 We also haven't had anyone that wants to maintain, add it, or requested it since the v1.0.0. If you're volunteering, I'm happy to see it added! I think UniverSC sounds more appealing to me personally.

Thanks for clarifying @Emiller88 that’s reassuring. It helps to understand the current state of the project and make sure no one else has plans for it. It is understandable that the change to DSL2 would be time-consuming and others were prioritised.

I'm open to maintaining it after contributing but I would be doing it in my own time as I’ve changed jobs and have young kids. Please understand there may be responses at unusual times as I am based in an Asian timezone.

I'd be more than happy to let that go and not add the cellranger mkfastq to this workflow....

@apeltzer that may be possible. UniverSC was intended to offer similar functionality to cellranger count but it does support technologies that use the I1 and I2 indexes for cell barcodes instead of multiplexed samples. There are differences in how it processes FASTQ files so generally I recommend to demultiplex samples with bcl2convert or the newer BCL Convert by Illumina (supported by BaseSpace and Dragen) before running UniverSC.

This is because it is intended to run samples other than 10X Chromium (already supported by Cell Ranger mkfastq and count). However, I was able to set up an open-source version of both functions. The UniverSC docker image includes cellranger mkfastq with the same functionality as Cell Ranger v3.0.2. Perhaps this would be useful for demultiplexing. In this case, they are interchangeable.

edmundmiller commented 1 year ago

I'm open to maintaining it after contributing but I would be doing it in my own time as I’ve changed jobs and have young kids. Please understand there may be responses at unusual times as I am based in an Asian timezone.

Awesome! You won't be maintaining it alone, I'm hoping we can keep this as a joint effort.

In this case, they are interchangeable.

So it sounds like supporting both would make sense? Anyone running 10x Chromium would want cellranger mkfastq and anyone running any other samples would want the flexibility of UniverSC.

I'm just afraid if we only have UniverSC, then there will be plenty of people who want Cellranger.

TomKellyGenetics commented 1 year ago

Awesome! You won't be maintaining it alone, I'm hoping we can keep this as a joint effort.

Fantastic, a community effort would be ideal in my view. This is something I'll gladly support depending on my availability.

I'm just afraid if we only have UniverSC, then there will be plenty of people who want Cellranger.

That's understandable, they have equivalent functionality for different data. The API is very similar by design (so any one familiar with Cell Ranger should be able to use our open-source tools. If we have a subworkflow working for Cell Ranger or UniverSC, it is not much trouble to migrate a copy of it to the other one.

This is exactly what we've done for cellranger mkref and cellranger count (https://github.com/nf-core/scrnaseq/pull/185). So I think doing the same with cellranger mkfastq is possible. Then users can choose whether to use the latest licensed version of Cell Ranger or the open-source version without EULA restrictions on applying it to other data.

This is less important to port to UniverSC as cellranger mkfastq has specific settings for the 10X Genomics index sequences (4 indexes per sample) hence it has been a lower priority. I can still support implementing it if there is interest in it. Sorry I may have misunderstood the scope of the demultiplexing pipeline (which does not implement counts per cell?) because I was involved with the scRNA-Seq pipeline. If it is possible to integrate the 2 pipelines (or at least document how they work together for single-cell analyses) that would be beneficial. Honestly a lack of documentation and standard workflows on demultiplexing is a weak point of the UniverSC which I developed in my postdoc (and one I am actively working on for other projects now).

For the above reasons, I'd suggest to start by implementing cellranger mkfastq and add a separate subworkflow for the open-source UniverSC container later if still needed.