theislab / hadge

Comprehensive pipeline for donor demultiplexing in single cell
https://hadge.readthedocs.io/en/latest/
MIT License
23 stars 6 forks source link

improve description of inputs #39

Open bio-la opened 5 months ago

bio-la commented 5 months ago

@mari-ga and @wxicu please modify the section on minimum required inputs following this:

The hashing and genetic mode are two independent workflows, and the rescue mode allows to perform joint demultiplexing by combining the outputs of the two workflows. Different inputs are required for the two workflows, specifically:
For the hashing workflow, raw and filtered HTO and RNA counts are the minimum required input. Each of these outputs is normally generated by the cellranger pipeline, which outputs the required HTO and RNA counts in the unfiltered (raw) and filtered feature-barcode matrices in two file formats: the Market Exchange Format (MEX), and Hierarchical Data Format (HDF5).

@mari-ga please include: references to cellranger and which version produces these outputs

For the genetic workflow, the minimal requirements are: the indexed sequence alignment file in BAM format along with its index (.BAI format), the barcodes of the cell-cointaining droplets in a TSV file, the number of expected donors in a mixture and the reference genotypes and the variants present in the pooled sample, both in VCF format. The VCF of the reference genotype can be an unrelated genomic reference to run methods in “genotype-free” mode. Optionally, if the pooled sample’s VCF is not available, we include two processes for variant calling (cellsnp and freebayes). Users can provide the mixed FASTA file which is used as input to generate the VCF file with freebayes, which is the default preprocessing for the scSplit method. All of the inputs, except for the reference VCF files, are commonly generated by the cellranger pipeline. Following deconvolution in each workflow, the output files are passed to the summary

@wxicu there is no mention of the number of donors expected in a mixture, which is a required input and will break the pipeline if not specified

wxicu commented 5 months ago

@bio-la do we keep the tables or remove them? here is the latest version https://github.com/theislab/hadge/blob/docs/docs/source/genetic.md

So i copy the text at the beginning of the doc, followed by 2 tables and additional parameters. And then use cases.

bio-la commented 5 months ago

ok!

Zethson commented 1 month ago

Is this done by the way? Then we can close it

wxicu commented 1 month ago

@mari-ga Is there any update on the hashing part?

mari-ga commented 1 month ago

I changed the envs, added cellhashR as bioconda package and added some minor fixes, I'll open a PR today