kfuku52 / amalgkit

RNA-seq data amalgamation for a large-scale evolutionary transcriptomics
BSD 3-Clause "New" or "Revised" License
7 stars 1 forks source link

lib_layout may not be considered appropriately in amalgkit integrate #70

Closed kfuku52 closed 2 years ago

kfuku52 commented 2 years ago

If I understand correctly, amalgkit integrate collects info from one of paired fastq files, so the line below reflects the total size of only one of pairs. Shouldn't this be multiplied by 2 in paired reads? https://github.com/kfuku52/amalgkit/blob/4e7898ed9609890046aea57cbaa9aea285da8e42/amalgkit/integrate.py#L74

Hego-CCTB commented 2 years ago

seqkit is smart enough to figure this one out on its own, if it knows the library layout, but this may need to be introduced when I make the changes to fix issue https://github.com/kfuku52/amalgkit/issues/69. Something I probably would have overlooked.

Here is an example. These are 2 separate integrate runs on trimmed Drosophyllum samples. One is paired the other one is fake-single (I only had one of the read pairs in the fastq directory, so integrate deduced single library): integrate_library_sum_len_test.zip

kfuku52 commented 2 years ago

Looks good! Thank you for testing it.