jpuritz / dDocent

a bash pipeline for RAD sequencing
ddocent.com
MIT License
53 stars 41 forks source link

Info about the number of mapped reads etc. #90

Closed Anto007 closed 2 months ago

Anto007 commented 2 months ago

Hi @jpuritz @pdimens

Great pipeline and thank you for bringing this out to the community! I would appreciate it very much if you could let me know where in the dDocent output I can find information such as the number of reads mapped to the reference transcriptome for each individual, mean coverage, etc?

Anto007 commented 2 months ago

If I look into the individual sample cov.stats files as below, is it correct that the last column represents the mean coverage values? Is it possible to also find somewhere in the output the info for the number of reads mapped here? Many thanks in advance @jpuritz @pdimens

head Sample1.cov.stats 
comp5   0   439 2
comp6   0   67  0
comp10  0   505 2
comp12  3   388 5
comp14  18  314 21
comp22  0   298 13
comp24  0   355 4
comp27  0   408 0
comp29  0   365 0
comp31  0   392 6
jpuritz commented 2 months ago

This is the output of bedtools coverage -b Sample1-RG.bam -a mapped.bed -counts -sorted -g genome.file > Sample1.cov.stats

It's bed format: contig start coordinate end coordinate count of reads that overlap with the interval

Hopefully, Sample1 is just an example name, but you should be following the dDocent naming convention PopulationIdentifier_SampleIdentifier. Also, dDocent is not designed for transcriptomics.

Anto007 commented 2 months ago

@jpuritz Many thanks for your prompt response-much appreciated! Yes, Sample1 is just an example name and I'm following the dDocent naming convention. My input data is from ezRADseq. I notice that there's also cov.stats and cov.split.stats available in my results directory. Do they represent just the mean number of read counts across all the samples that were analyzed?

jpuritz commented 2 months ago

No. cov.stats is the sum. Cov.split.stats is used for creating SNP calling intervals. It can be ignored.

Jon Puritz, PhD (he/him)

Associate Professor Department of Biological Sciences University of Rhode Island 120 Flagg Road, Kingston, RI 02881

Webpage: MarineEvoEco.com

Cell: 401-338-8739 Work: 401-874-9020

"The most valuable of all talents is that of never using two words when one will do.” -Thomas Jefferson

The University of Rhode Island occupies the traditional stomping ground of the Narragansett Nation and the Niantic People.

On Mon, May 06, 2024 at 10:08 AM, Jant007 @.***> wrote:

@jpuritz https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jpuritz&d=DwMCaQ&c=dWz0sRZOjEnYSN4E4J0dug&r=YjPLxZU-wfPfb3H1Y34afw&m=OOfnyMstyno8JyT7TjP1XMbwI4bDKej1jJubadYEB6puaxHaCC0fDh_5xvZ3z0Yj&s=zEGvi7NM5iWAuE-Ws4mcq4nIzR4n6SZT9c86BlvRXqg&e= Many thanks for your prompt response-much appreciated! Yes, Sample1 is just an example name and I'm following the dDocent naming convention. My input data is from ezRADseq. I notice that there's also cov.stats and cov.split.stats available in my results directory. Do they represent just the mean number of read counts across all the samples that were analyzed?

— Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_jpuritz_dDocent_issues_90-23issuecomment-2D2096111674&d=DwMCaQ&c=dWz0sRZOjEnYSN4E4J0dug&r=YjPLxZU-wfPfb3H1Y34afw&m=OOfnyMstyno8JyT7TjP1XMbwI4bDKej1jJubadYEB6puaxHaCC0fDh_5xvZ3z0Yj&s=CRMtfL2MqMkcc6Gse6rtimjvaP7N_lEb3-5t_m9dCB8&e=, or unsubscribe https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ABE5CR3B7DGC3EIZWID5GK3ZA6FHPAVCNFSM6AAAAABHIZAHGOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJWGEYTCNRXGQ&d=DwMCaQ&c=dWz0sRZOjEnYSN4E4J0dug&r=YjPLxZU-wfPfb3H1Y34afw&m=OOfnyMstyno8JyT7TjP1XMbwI4bDKej1jJubadYEB6puaxHaCC0fDh_5xvZ3z0Yj&s=Zs2pq6013MbOrgzQ3j4EAK-Lt6EeKHV5O6spFFpDrfs&e= . You are receiving this because you were mentioned.Message ID: @.***>

Anto007 commented 2 months ago

Thank you very much again