Closed yueli8 closed 4 years ago
Hello Yue, Sorry I did not provide the demo data at Github. It was expected to be a small CEL-Seq2 sequencing fastq file. I was afraid I no longer work at NYU so I cannot provide you the data. However, here are two solutions.
If possible, you could use the original CEL-Seq2 paper's data. It seems each single cell has it own FASTQ so you need to merge them to run this celseq2 pipeline.
Alternatively, I found myself have implemented dummy CEL-seq2 sequencing reads. It just randomly generated a pair of R1 + R2 fastq.gz, a GTF and a fasta of a dummy species.
Once you install celseq2, these Bash commands are available to you.
# generate dummy species
celseq2-dummy-species --gtf a/b/c.gtf --fasta a/b/c.fasta
# generate dummy CEL-seq2 reads
celseq2-simulate --gtf a/b/c.gtf --fasta a/b/c.fasta --savetor1 R1.fq --saveto2 R2.fq
In this way you will have everything to technically run the celseq2 pipeline.
Best, Yun
Hello, Yun,
Thank you so much for your detailed explanation!
Thank you again and really appreciated!
Best,
Yue
li@-Desktop-590-p0xxx:~/celseq2-master$ celseq2-simulate --gtf Danio_rerio.GRCz10.87.gtf --fasta Danio_rerio.GRCz11.dna.toplevel.fa --savetor1 SRR9609653_1.fastq --savetor2 SRR9609653_2.fastq
Traceback (most recent call last):
File "/home/li/.local/bin/celseq2-simulate", line 8, in
sys.exit(main())
File "/home/li/.local/lib/python3.6/site-packages/celseq2/dummy_CELSeq2_reads.py", line 264, in main
gzip=args.gzip)
File "/home/li/.local/lib/python3.6/site-packages/celseq2/dummy_CELSeq2_reads.py", line 183, in dummy_CELSeq2
min_qual=default_qual)
File "/home/li/.local/lib/python3.6/site-packages/celseq2/dummy_CELSeq2_reads.py", line 110, in dummy_readquality
assert not length is None, 'Specify length.'
AssertionError: Specify length.
li@-Desktop-590-p0xxx:~/celseq2-master$ head -n 12 SRR9609653_1.fastq
@MN00336:7:00CELSEQ2:1:78888:10964:367 1:N:0:ATGCGC
AAAAAAAGACTC
+
,@/9?C9:+86E
@MN00336:7:00CELSEQ2:1:20266:32663:7 1:N:0:ATGCGC
AAAAATAGACTC
+
.-?/D.;8,+GI
@MN00336:2:00CELSEQ2:1:42122:46947:341 1:N:0:ATGCGC
AAAAAGAGACTC
+
.1<76.0.4;;/
li@-Desktop-590-p0xxx:~/celseq2-master$ head -n 12 SRR9609653_2.fastq
@MN00336:7:00CELSEQ2:1:78888:10964:367 2:N:0:ATGCGC
CATTCTTCACAGAGTATCTGCAGGTATTGATCACCCCTGATCAGTTATTG
+
;B80ED6-::B=:<6,,HE4F;.E=G@->CF5;@0EC,=60=I<.:CHA,
@MN00336:7:00CELSEQ2:1:20266:32663:7 2:N:0:ATGCGC
GCGGAAGCCCCGCTCCAGCTAAACACCCAGTGGTTCGCTCTCAGAGTTAT
+
.AIGF5A:H@EE6A-1H7@<3=G5500D@H?H1016I19,+;>:GEH<3/
@MN00336:2:00CELSEQ2:1:42122:46947:341 2:N:0:ATGCGC
GCAGCTCCTGCTTTCATTCAATTAGTGTGATTACAGACAAACGTCATCAA
+
22-9HDC954+1I/<8AI4HB:<88D->@A/B?C>-B?01CD,0I7,9@:
No. My simulation is really naive so it is only working for the dummy species I generated. Here it seems you directly run on the zerbra fish.
Hello, Puriney,
Thank you so much for your message!
It only comes out Chr1, is that correct?
li@Desktop-590-p0xxx:~/celseq2-master$ celseq2-dummy-species --gtf Caenorhabditis_elegans.WBcel235.99.gtf --fasta Caenorhabditis_elegans.WBcel235.dna.toplevel.fa
End of chr1 is 4200
li@Desktop-590-p0xxx:~/celseq2-master$ celseq2-simulate --gtf Caenorhabditis_elegans.WBcel235.99.gtf --fasta Caenorhabditis_elegans.WBcel235.dna.toplevel.fa --savetor1 GSE78779_C_elegans.fastq
usage: celseq2-simulate [-h] [--gtf FILE] [--fasta FILE] --savetor1 FILE
--savetor2 FILE [--expected-alignment FILE] [--gzip]
[--test] [--verbose]
celseq2-simulate: error: the following arguments are required: --savetor2
But When I used following command, it only comes out one fastq file: SRR3196087_1 fastq, not two R1+R2.
li@Desktop-590-p0xxx:~/sratoolkit.2.9.6-1-ubuntu64/bin$ fastq-dump --split-files SRR3196087.sra
Read 1273706 spots for SRR3196087.sra
Written 1273706 spots for SRR3196087.sra
Thank you again and really appreciate any of your help!
Best,
Yue
The simulation is to create a fake genome but it is designed by me. So don't use any real species data. The goal is simply to run the pipeline.
This command will generate the new GTF and FASTA files for this fake genome.
The command will generate random reads for the fake species.
From now on, consider the fake species as a human/mouse genome data. Do routine pre-processing.
Run bowtie2 / STAR to index the fake_species.fa
Follow the tutorial to setup the pipeline.
Use the R1.fq, R2.fq, fake_species.gtf and the aligner index to run the pipeline.
Hello, Puriney,
Thank you so much for your great help!
It works!
Thank you again!
Best,
Yue
Glad to hear that. Thanks for using our tool.
Hello,
Thank you for developing so nice software:
CEL-Seq2
.Where can I download the demo data of in Experiment Table:
lane1-R1.fastq.gz, lane1-R2.fastq.gz, lane2-R1.fastq.gz, and lane2-R2.fastq.gz
in the website:
https://github.com/yanailab/celseq2
OR in the file
config_single_lib.yaml
:Thank you in advance for great help!
Best,
Yue