lh3 / yak

Yet another k-mer analyzer
MIT License
113 stars 8 forks source link

yak count question with paired reads #10

Closed scottgeib closed 3 years ago

scottgeib commented 3 years ago

Under "getting started" on readme page, it describes:

for paired end: to provide two identical streams

./yak count -b37 -t32 -o sr.yak <(zcat sr.fq.gz) <(zcat sr.fq.gz)

What is the reason to feed the same reads to yak twice as identical streams? Or is this meant to be R1 and R2 read pairs as separate streams?

lh3 commented 3 years ago

yak count needs to read the input twice but a stream can be only read once.

scottgeib commented 3 years ago

Thanks Heng! So the input should be both the F and R paired read files streamed twice together?

So in a folder containing test.R1.fastq.gz and test.R2.fastq.gz it would be :

yak count ….. < (zcat fastq.gz) < (zcat fastq.gz) #same set of files streamed twice (so yak can find read pairs, one from each stream) not yak count ….. < (zcat R1.fastq.gz) < (zcat R2.fastq.gz). # one stream of forward and one stream of reverse (in this case yak would be assuming paired read files, which yak does not assume)

From: Heng Li @.> Reply-To: lh3/yak @.> Date: Monday, August 30, 2021 at 12:29 PM To: lh3/yak @.> Cc: "Geib, Scott" @.>, Author @.***> Subject: Re: [lh3/yak] yak count question with paired reads (#10)

yak count needs to read the input twice but a stream can be only read once.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Flh3%2Fyak%2Fissues%2F10%23issuecomment-908748610&data=04%7C01%7C%7C4535d1a71e8c4de8a59b08d96c05a843%7Ced5b36e701ee4ebc867ee03cfa0d4697%7C0%7C0%7C637659593826643345%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=LfZ6%2BY4qdhCH3x3lrwgMzl0umehRhwgQw70RFJisLx0%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAQYYOSH4U77TSWDTQGNG2JTT7QA5HANCNFSM5DCX3Z4A&data=04%7C01%7C%7C4535d1a71e8c4de8a59b08d96c05a843%7Ced5b36e701ee4ebc867ee03cfa0d4697%7C0%7C0%7C637659593826643345%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7EQ%2BFdt%2BLA%2BeoNdU4Hlucl2rdoE2QPhr9WQzXw9Z4hs%3D&reserved=0. Triage notifications on the go with GitHub Mobile for iOShttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7C%7C4535d1a71e8c4de8a59b08d96c05a843%7Ced5b36e701ee4ebc867ee03cfa0d4697%7C0%7C0%7C637659593826653299%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2wLoPf9sXM6%2FtoFZXBDwmB5FaFG2XlsLjXiVYmkFhWs%3D&reserved=0 or Androidhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7C%7C4535d1a71e8c4de8a59b08d96c05a843%7Ced5b36e701ee4ebc867ee03cfa0d4697%7C0%7C0%7C637659593826653299%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=2fGslH3C1vPVSTAxJ0JD9fldVRrcdDQzydECQeAqFfo%3D&reserved=0.

This electronic message contains information generated by the USDA solely for the intended recipients. Any unauthorized interception of this message or the use or disclosure of the information it contains may violate the law and subject the violator to civil or criminal penalties. If you believe you have received this message in error, please notify the sender and delete the email immediately.

lh3 commented 3 years ago

So the input should be both the F and R paired read files streamed twice together?

Yes