lskatz / lyve-SET

:dancer: :palm_tree: LYVE-SET, a method of using hqSNPs to create a phylogeny, especially for outbreak investigations
MIT License
25 stars 18 forks source link

Getting Assembly metrics #86

Open tbazilegith opened 2 years ago

tbazilegith commented 2 years ago

Hi Iskatz, I need to compute assembly metrics, including coverage. My input data files are single-end fastq and the genome assembly fasta . In the documentation, paired-end reads must be shuffled - so I did not shuffle my data In the command below (genome is the size of the genome)

lyveset_1.1.4f.sif run_assembly_readMetrics.pl se_read.fq.gz -e ' + genome.astype(str) + ' > se_read'_readMetrics.txt'

the coverage value isn't computed, it came out a dot (.) for all my samples, and I got a (yes) for the avg-quality File avgReadLength totalBases minReadLength maxReadLength avgQuality numReads PE? coverage readScore medianFragmentLength

se_read_readMetrics.txt 3 3396752 3396752 1 6125697 38 yes 1.00 1 .

I added the flag --singleend as you recommended me, but the command failed

'lyveset_1.1.4f.sif run_assembly_readMetrics.pl --singleend se_read.fq.gz -e ' + genome.astype(str) + ' > se_read'_readMetrics.txt'

Is there anything I should add to troubleshoot this? Did I use the wrong script with --singleend? (' ' ie code run in python through singularity container) Thanks, TJ

lskatz commented 2 years ago

Hi TJ, you are correct that --singleend is not a parameter for this. Are you saying that the columns are mismatched? Do you see it line up better if you run column -t se_read_readMetrics.txt? And then I guess one more issue is that coverage is not computed. You are correct; that value is only determined for bam files since there is no assembly to compare the raw reads to.

Could you show the output of the column -t command?

tbazilegith commented 2 years ago

Hi Iskatz, Actually, the columns match but some of them have no values computed, such as the the coverage column. Here is the output of the column -t. First and second rows are column headers File avgReadLength totalBases minReadLength maxReadLength avgQuality numReads PE? coverage readScore medianFragmentLength LRtrimmedfastqs/sample_21547.fq.gz 1.00 1 Inf 0.00 1 yes 0.00 . . Thanks, TJ