Closed Davidwei7 closed 1 year ago
Hi all, I was wondering if you had a chance to look into my issue I experiencing which is described in two posts? Thank you in advance. Looking forward to your response. Best Wishes, David
Hi, sorry for the delayed response. I don't have much time at the moment but I have some ideas on what may be causing this. I think it is unrelated to the technology. The "proc" command is used to detect the number of cores available to set the default number of threads.
Please try running it again with the number of threads set manually with --threads and let us know if you still have problems persisting. I'll note that you have different system configurations compared to ours discussed in previous issues some some dependencies may be missing.
This dataset appears to use SmartSeq2.
Construction protocol: A modified Smart-seq2 protocol was applied for single-cell RNA-seq. Briefly, a single cell was picked into the lysis buffer by mouth pipette. The reverse transcription reaction was performed with 24 oligo (dT) primer anchored with the 8 bp cell specific barcode, and also with 8 bp unique molecular identifiers (UMIs).
https://www.ncbi.nlm.nih.gov/sra/?term=SRR6026844 https://trace.ncbi.nlm.nih.gov/Traces/?run=SRR6026844
Note that our code supports the following configurations: launch_universc.sh: STRT-Seq (6 bp barcode, no UMI): strt-seq launch_universc.sh: STRT-Seq-C1 (8 bp barcode, 5 bp UMI): strt-seq-c1 launch_universc.sh: STRT-Seq-2i (13 bp barcode, 6 bp UMI): strt-seq-2i
It is possible to support an 8bp UMI but it will require a dedicated configuration. If it is a popular protocol we can support this but it appears to be a custom workflow used in this paper. Another possible workaround is to rename R1 and R2 (manually switch them) and run custom_8_8 which assumes R1 contains [BC][UMI].... and R2 contains transcript reads (as for 10x settings).
I'll note the paper here to investigate later in more details:
Fan X et al., "Spatial transcriptomic survey of human embryonic cerebral cortex by single-cell RNA-seq analysis.", Cell Res, 2018 Jul;28(7):730-745
@Davidwei7 sorry for the delayed response. I've investigated issues with these protocols and updated the source code to support it.
Please note that this protocol by Fan et al. (2018) is significantly modified from the originally published data from Islam et al. (2011).
We modified the STRT-seq method for amplification of single-cell transcriptomes by changing the reverse transcription primer, the induced cell barcode, and the unique molecular identifier (UMI).
This requires a different bioinformatics approach.
Raw reads were first segregated based on the cell-specific barcode information in read 2 of the pair-ended reads. Then, sequences in read 1 were trimmed with customized scripts to remove the TSO sequence, the polyA tail sequence and sequences with low-quality bases (N > 10%) or contaminated with adapters. Subsequently, the stripped read 1 sequences were aligned to the hg19 human reference genome.
Therefore I have created separate technology settings "strt-seq
" for the original protocol and "strt-seq-2018
" for the custom version. I've pushed this new configuration to the "dev" branch so it is possible to update to the development version to try it. There are minor changes to the source code so I expect it will run without errors. I've tested it on raw SRA data in FASTQ format from both publications and confirmed it created Cell Ranger compatible files.
Closing this issue as this technology is now supported. Raw reads from SRR6026844 tested without errors. Please re-open of file another issue if there are still problems with your environment preventing you from replicating this.
Hi!
has this been added to the main branch and incorporated into what is installed in the docker? I don't have an option for strt-seq-2018
in my docker installation.
v1.2.7 has been merged and released on GitHub. Docker builds are in progress and will be available soon.
@adc0032 @Davidwei7 @kbattenb The latest version (1.2.7) passed docker builds and is now available on dockerhub: https://hub.docker.com/r/tomkellygenetics/universc/tags
This version supports STRT-Seq, PIP-Seq, and VASA-Seq protocols. I have some minor changes to versioning and issues #17 or #20 under consideration but the above technologies should work resolving the above issues #12 and #16.
@TomKellyGenetics
Thank you for getting this updated!
Should strt-seq-2018
still be undergoing file format conversion in line 3138?
I don't see it included here in the STRT-Seq section.
I think not since UMIs are already included in the 2018 custom protocol. It may be necessary to remove the TSO sequence from as described in the paper (by hard trimming R1s) if you are using (paired-end) 10x 5' scRNA chemistry settings. I think it is not necessary to perform TSO conversion on the R2 after the barcode and UMI as you can just use 10x 3' scRNA chemistry settings which ignore the rest of this read.
Dear Sir/Madam, Hope you are well. Following resolved problem with the docker image on my cluster, I tried my first run with the launch_universc.sh with technology of STRT-Seq. I ran everything in the docker image converted singularity image. My command is this:
launch_universc.sh -R1 SRR6026844_sra_S1_L001_R1_001.fastq -R2 SRR6026844_sra_S1_L001_R2_001.fastq -t strt-seq -r /lustre/project/m2_jgu-canshank3/Comparison/Human/HomSap_GRCh38 -i SRR6026844
Please see the below snapshots of the processes and errors:
Please see below the first few rows of the fastq files:
head -n 24 SRR6026844.sra_S1_L001_R1_001.fastq @SRR6026844.sra.fastq.1 1 length=150 AAGCAGTGGTATCAACGCAGAGTACATGGGGAAAAAGAGAAAAGTGGAGGGATGTGTGGGCCTAGACAGGGGAAAAAGGAGAACAGGAGGCTCCAGACTGGTGAGGAAGGGGAGTGGGCTGGGCGTGCGGCTCATGCCTGTCATCCCAGC +SRR6026844.sra.fastq.1 1 length=150 AA<<FFJJJFJJJJJJAJJJJJFA<JFJF<7FFFJJ--FJJJJA-F-AJJF<7FAA-FFJJ<AJJJFJJJ--7AJAJJFFJ<J7AJFA<FJ-7-AAJ7JF<<F7AJAAFJ7--777FJJFAA<JA-AJFAJJ-<7<7<FFAJF-FFAF7F @SRR6026844.sra.fastq.2 2 length=150 AAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTGGTATCAACGCAGAGTACATGGGAAGCAGTCGTATCAAAGCAGAGTACATGGG +SRR6026844.sra.fastq.2 2 length=150 AA7FAF7-FFJJJJJJFJJJJJAAJJ7FFJJJJJJJJJJJJJJJJJJJJJJJFJJJFJJJAJJJJJJJJJAJJJJJJJJJJJ-<JJ<J<<-FJA<-<--A7AFJ--7AJJ<<-FF-FJAJA-A<-7F-7AA<--7---FF-)--<AJFJJ @SRR6026844.sra.fastq.3 3 length=150 AAGCAGTGGTATCAACGCAGAGTACATGGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA +SRR6026844.sra.fastq.3 3 length=150 AA<FFF<FFFJJJJJJJFJ<JF<JFJ<A<7-FJJJJJFJFJFJJJJJJJJJJFJJJJJJJJJJJJJJJJAJFJJJJ<FJ-FJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJFJJJJAJJJAJFFJJJJJFFJJJJJFJJJFFAA7FFJ @SRR6026844.sra.fastq.4 4 length=150 TGACCTGTCCCCTCTGGCTGCCTCTGAGTCTGAATCTCCCAAAGAGAGAAACCAATTTCTAAGAGGACTGGATTGCAGAAGACTCGGGGACAACATTTGATCCAAGATCTTAAATGTTATATTGATAACCATGCTCAGCAATGAGCTATT +SRR6026844.sra.fastq.4 4 length=150 AA<-<7AFFJAFJJJF-7<FJFFAJJA7FFFJF7FJ7JJJF7FJAJFJFF7FFFFJ-FJJJ-<A-F<JFJ-<AA7-AJJJ<FFFJFFJF<JJAJF-FJA-FJJJ7FJJJJF<7A<FJFFJ-<AAJJJ7F7<F<FF7-<7<-<FAF<FAFJ @SRR6026844.sra.fastq.5 5 length=150 GGAAGGAAGGAAGAAAGAAAGAAAGATAGAGAGAGAGAGAGAGAGAGAAAGATAGAGAGAAATAAAGAAACAAAGAAAGAAAGAAAGAAAGAAAGAAAAAAAAAGAAAAATACAAAAAAAAAAATTCACTTAACTCAGGGGTTCGGAGAT +SRR6026844.sra.fastq.5 5 length=150 -A-FFF77F-F<<F<JJAF<AJFFJ----<<FJAFFF<FF<FJF<<AFJJJ<<FJJFJFJJ7-F<JAJJA---7A<7AF-<7-777AJAFJJ<-AJJ-F-<-A---A---77F7--7F<FAF<A-------7--7--A--))-)7)---7 @SRR6026844.sra.fastq.6 6 length=150 CCTCCAGATACCACTGAGCCTCTTGCCCATGATTCAGAGCTTTCAAGGATAGGCTTTATTCTGCAAGCAATCAAATAATAAATCTATTCTGCTGAGAGATCACAAAAAAAAAAAAAAAAAAAAAAAAAAACCTATTTGCTGATGAGATCA +SRR6026844.sra.fastq.6 6 length=150 AA<7-7<<FFJJFFFJJJJJJJ<FJF<J<JFJJJJJJJJJJJJJFJJJFA7F<FJJJFFFJJAJFJ<FF-FA77JA-AJFFJFFA7-FA-FJJJ-AFJ----<F-<AJJA<<--AF-7AFA--AF-A<--7-7-AA<---7---7---7-
head -n 24 SRR6026844_sra_S1_L001_R2_001.fastq @SRR6026844.sra.fastq.1 1 length=150 NAGGTGCATTCGCCCTCCGTAGAAATCCATGCCAAGTACGCTCCTTCCATTGATTTTCTTGGATCGGGTGTGCACCGCGTAGCTCAGCATGGCAAGTCTGTGTAGTCCGTGGACCCGCCAGGACCCCCCGCCGCACGAGACGCAATACGT +SRR6026844.sra.fastq.1 1 length=150
AAA--A--777-A---AA--7------7-7))--)---7)-7----7--------7--77-----7))-))7-7)<--))77-7--)---))--)----7-----7-))7))-)))))-))))-))))-)-)-)))))-))))7---7-
@SRR6026844.sra.fastq.2 2 length=150 NTATGACTCCACCCCTCAGAGAGGAGGAGGCGACGGGGACAACAACTCACAGAGAGCAAAGTCCGTGGCAACCACCCCGTCTGCGGAGAGCAGGTCCGACCCTACTAGACGAGAGACAACGAACGCCGGACCGCACAATGGCGAGAGCTA +SRR6026844.sra.fastq.2 2 length=150
<A-----A---7A--A--<-7---))7)7--7)-)---77<F-7-7--7---7-77------77--)))-7)--))))))))-))7)7))-)))))))))))-)----7--)-)----------)))))))))7-)<----))-)))-7
@SRR6026844.sra.fastq.3 3 length=150 NCCACATATAGGGAAACATTTTAATTCTTAGTTATTATTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATTTTTTTTTTTTTTTTTTTTTTTTCATTTTTTTTATTT +SRR6026844.sra.fastq.3 3 length=150
AAAA-F-A<-A--7----------7--------7------77--F<----7<-<7A<A--AA7-A-----F-<<-<-7--7--7-<77A<7A7-A<-A-F--<A<<-------------A<7-F7-------7----7----7---7--
@SRR6026844.sra.fastq.4 4 length=150 NACTTCGATATAAGATTTTTTTTTTTATTTATTACTCAAAGTTTAGAACATTTTATTAAAGTACAAAAATGTTAGAATTTAGCTAATAGAAAAACATAGTAAATATTTAAAAAAACGCTTATAAAATTACTCAAGGCACCCACAGAAAAC +SRR6026844.sra.fastq.4 4 length=150
AAFFAJ-FAJFJJJJ----------------------A---7-77-FFF----7--<-A-7A<--7<----A-A7<---77<-7<-77<FJ77-<-7--AAF-A-------AFJF<---7-<7--7-<----7--)-))-))))7-7--
@SRR6026844.sra.fastq.5 5 length=150 TCGAAGTATGGTGATATCGGAAGAGCTTCGAGTACGTAAATAGTGTAGATCTCGGTTGTCGTCTTATCATTAAAAAAACATTTCTTACTTTTCTCTCTTCGCACACCTCACTTCCTCGCTATATTGCTTCCTCCCTTCCGGGGACAGACC +SRR6026844.sra.fastq.5 5 length=150 --AF-FF<--7F7-A---<-7--------7-7---7---7-------7----<-)--)---------7---<------------------------7----)-))))-)------7-<))----7----7)--7<)-7<))----))--) @SRR6026844.sra.fastq.6 6 length=150 TATCAGCAAATAGGGTTTTTTTTTATTATTTTATTTTTTTTTTGATCCCTGAGGAGAATAGCGTTCATATGTGAGTTCTGGCAGAACAAAGGCTAACCTTGAAAGGCCTGTTATCTGGGACAGAAGCCCAGGAGTGCTCGTGTCTGTACC +SRR6026844.sra.fastq.6 6 length=150 AAAAAFF<FF-FJ-F----------------------------77---)-))7A-7<7----------7------------777)----7A--------7-<AA7<-)7-)-----77)))))----7)))-)7-))-)7)7-)------
**Do you have any idea why the process was not completed?
I also have some information regarding the fastq file and I am sharing here to see if it could be helpful us resolving this problem I am facing.
Firstly, the sequence structure of the fastq file is this:
Secondly, the more detail on how the author analysed their data is in their github (link: [https://github.com/zorrodong/HECA/tree/master/scRNA-seq_pipeline_hg38]).
I am not sure whether I am correct on thinking this:
-b barcode_96_8bp.txt
(barcode_96_8bp.txt is found in their github page: [https://github.com/zorrodong/HECA/blob/master/scRNA-seq_pipeline_hg38/barcode_96_8bp.txt]).custom_8_8
in my command? Does our current UniverSC support this setting?umi_tools
in the pipeline to firstly extract UMI and barcodes from the raw fastq file, do I need to do this first before using Universc? (the authors' pipeline is this: )I am terribly sorry for giving so much information on my issue. I am quite new to complex bioinformatic problems and want to use your software for integrative analysis. Because I have three datasets with BD-rhapsody technology and 10xGenomices and STRT-Seq (described above), and the STRT-Seq technology generated fastq file I described in this post is the main reference data we are comparing against, I want to do this correctly.
Thanks for developing this tool, and looking forward to your response.
David