leeju-umich / Cho_Xi_Seqscope

Seqscope codes
10 stars 5 forks source link

Input 1st fastq downloaded from SRA does not have location infomation. #1

Closed ShenXilin closed 3 years ago

ShenXilin commented 3 years ago

In the extractCoord.sh script, the first instruction is like:

zcat $miseq | sed -n '1~4s/:/ /gp' | cut -d ' ' -f 4-7 >  ./pos-MiSeq-temp.txt

I download the Liver_SeqScope_1st fastq files by prefetch (2.9.3) from SRA :

prefetch -O .  SRR14082753
fastq-dump --split-files SRR14082753.sra

and the SRR14082751_1.fastq look like :

@SRR14082751.1 1 length=25
NAGACGACTCTCCCCGCTATAGATN
+SRR14082751.1 1 length=25
#8ACCGGGGGGEF@CFGGGGFGGG#
@SRR14082751.2 2 length=25
NTCAGCAAGAAGCCCCATCGAGATN
+SRR14082751.2 2 length=25
#8ACCGEGGGGGCFFGGGGGGGGG#
@SRR14082751.3 3 length=25
NTAATCAATACGCCGCGGTTAGATN

However, the downloaded fastq does not contain : in the lines starting with @. So there’s no way that sed -n '1~4s/:/ /gp’ would work to get position information.

I am confused. Is my input wrong? And if so, how to get the right 1st fastq input data?

Many Thanks.

leeju-umich commented 3 years ago

It seems that SRA somehow erased all coordinates information to reduce the file size. The raw 1st-Seq MiSeq files, as well as their extracted coordinates, could be found in the Deep Blue Data repository (https://doi.org/10.7302/cjfe-wa35).

sztankatt commented 3 years ago

@ShenXilin Thanks for the reply. For the liver samples, which 2nd-seq fastq.gz files should we use? I noticed that there are 7 paired-end runs. So far I downloaded the first one (the 69 million read one), and I tried analysing it with using barcodes from tile 2103. However I get very few counts per barcode, mostly 0s... I am wondering if you could give me any advice as to which fastq.gz file to start with

leeju-umich commented 3 years ago

To get enough information to reproduce our work, I would suggest you combine all results. Thank you for the question.

On Sat, Sep 11, 2021 at 1:42 PM Tamas Sztanka-Toth @.***> wrote:

@ShenXilin https://github.com/ShenXilin Thanks for the reply. For the liver samples, which 2nd-seq fastq.gz files should we use? I noticed that there are 7 paired-end runs. So far I downloaded the first one (the 69 million read one), and I tried analysing it with using barcodes from tile

  1. However I get very few counts per barcode, mostly 0s... I am wondering if you could give me any advice as to which fastq.gz file to start with

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/leeju-umich/Cho_Xi_Seqscope/issues/1#issuecomment-917445222, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQLZFMOO4Z3GBYX3OQODG7DUBOIGTANCNFSM5CMG5W6Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

Jun Hee Lee, Ph.D. Associate Professor Department of Molecular and Integrative Physiology University of Michigan Geriatrics Center 109 Zina Pitcher Place, 3019 BSRB Ann Arbor, MI 48109-2200 Office: 734-764-6789 | Lab: 734-764-6795 Fax: 734-936-9220 | E-mail: @.*** https://lee.lab.medicine.umich.edu/

pleasebelucky commented 2 years ago

@leeju-umich Thanks for the reply. I noticed the first six fastq files are called 2nd1 to 6_R1, while the last is called sra-pub-run-11. Is this still belong to liver tissue? Also, by combing all result, do you mean adding up all UMI count with the same gene and barcode? Thank you!

leeju-umich commented 2 years ago

For liver data, there should be seven 2nd-Seq files, as annotated in SRA metadata. All sequencing data are from the same library source, so you combine the results and consider as one experiment. Best, Jun Hee

On Tue, Oct 12, 2021 at 9:31 PM temp @.***> wrote:

@leeju-umich https://github.com/leeju-umich Thanks for the reply. I noticed the first six fastq files are called *2nd*1 to 6_R1, while the last is called sra-pub-run-11. Is this still belong to liver tissue? Also, by combing all result, do you mean adding up all UMI count with the same gene and barcode? Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/leeju-umich/Cho_Xi_Seqscope/issues/1#issuecomment-941825935, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQLZFMIXNYX6RPKQFGADFG3UGTOP3ANCNFSM5CMG5W6Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

Jun Hee Lee, Ph.D. Associate Professor Department of Molecular and Integrative Physiology University of Michigan Geriatrics Center 109 Zina Pitcher Place, 3019 BSRB Ann Arbor, MI 48109-2200 Office: 734-764-6789 | Lab: 734-764-6795 Fax: 734-936-9220 | E-mail: @.*** https://lee.lab.medicine.umich.edu/

leeju-umich commented 2 years ago

Please note that, even though the library was the same, the sequencing was performed in diverse platforms, including HISEQ, NOVASEQ and BGISEQ.

All the best, Jun Hee

On Tue, Oct 12, 2021 at 9:36 PM Jun Hee Lee @.***> wrote:

For liver data, there should be seven 2nd-Seq files, as annotated in SRA metadata. All sequencing data are from the same library source, so you combine the results and consider as one experiment. Best, Jun Hee

On Tue, Oct 12, 2021 at 9:31 PM temp @.***> wrote:

@leeju-umich https://github.com/leeju-umich Thanks for the reply. I noticed the first six fastq files are called *2nd*1 to 6_R1, while the last is called sra-pub-run-11. Is this still belong to liver tissue? Also, by combing all result, do you mean adding up all UMI count with the same gene and barcode? Thank you!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/leeju-umich/Cho_Xi_Seqscope/issues/1#issuecomment-941825935, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQLZFMIXNYX6RPKQFGADFG3UGTOP3ANCNFSM5CMG5W6Q . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

--

Jun Hee Lee, Ph.D. Associate Professor Department of Molecular and Integrative Physiology University of Michigan Geriatrics Center 109 Zina Pitcher Place, 3019 BSRB Ann Arbor, MI 48109-2200 Office: 734-764-6789 | Lab: 734-764-6795 Fax: 734-936-9220 | E-mail: @.*** https://lee.lab.medicine.umich.edu/

--

Jun Hee Lee, Ph.D. Associate Professor Department of Molecular and Integrative Physiology University of Michigan Geriatrics Center 109 Zina Pitcher Place, 3019 BSRB Ann Arbor, MI 48109-2200 Office: 734-764-6789 | Lab: 734-764-6795 Fax: 734-936-9220 | E-mail: @.*** https://lee.lab.medicine.umich.edu/