scharch / SONAR

Software for Ontogenic aNalysis of Antibody Repertoires
GNU General Public License v3.0
17 stars 10 forks source link

SONAR with single cell sequencing?? #7

Closed methornton closed 4 years ago

methornton commented 4 years ago

Hello! I was wondering if you could separate each barcoded cells reads into a new fastq file, 1000 to 10,000 fastq files per 10x genomics experiment. Do you think SONAR would be able to process them and draw useful comparisons? What would be the major limitations of doing this? I am sure that using the framework regions of the antibodies would allow only barcoded cells that contained them to be analyzed. Also, barcoded cells with framework regions could be merged into a in-silico "bulk" RNA-seq data set. Has anyone done this? I expect that the "LIBRA-seq" 10X genomics is going to be proprietary software, so not as useful to the public research community.

scharch commented 4 years ago

SONAR cannot process standard 10x data - it could partition the reads by cell/UMI easily enough, but it has no engine to assemble the contigs from those partitioned reads. I suppose if you really wanted you could then pass the partitioned reads to something like Trinity, but I wouldn't trust the output without a whole lot of testing and parameter adjustment.

Instead, I would recommend running cellranger vdj and then dumping the contents of filtered_contig_annotations.csv to a fasta so that SONAR can re-annotate them and do lineage analysis.

(NB the single cell functionality in 1.0-preprocess.py is intended to handle an alternate sequencing strategy in which we use custom primers to enrich the cDNA from the 10x machine, followed by standard amplicon sequencing. This should probably be made clearer in the documentation than currently.)

I believe that there are plans to eventually release a public version of the LIBRAseq software, but it might be a while, as they are apparently integrating more with cellranger first. In anycase, you can definitely do a version of LIBRAseq with SONAR now: Pass the LIBRAseq libraries to 1.0-preprocess.py using the --featureLibrary option and (if necessary) a dummy data file for --input. The read counts for each antigen will get put into a table in output/tables/<project>_features.tsv, which you can open in R and then do the thresholding, CLR, and LIBRAseq score calculation with a pretty simple custom script. It's also likely that LIBRAseq score will be directly integrated into SONAR in the relatively near future (1-2 months).

methornton commented 4 years ago

Sweet! I will try that.