wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
419 stars 47 forks source link

Confusion regarding arrow formatted files #364

Closed TBradley27 closed 5 months ago

TBradley27 commented 5 months ago

Hello,

In the documentation for cramino, it states that a file in arrow format can be produced which can then be used with NanoPlot.

However, the documentation for NanoPlot does not describe how this arrow file can be used

wdecoster commented 5 months ago

Hi,

Oh yes, I see that it is poorly described. Thanks for letting me know. The arrow files are, confusingly enough, the same as feather, but I have now specified that in the documentation. So you can use NanoPlot with --arrow to specify the arrow input files.

Best, Wouter

TBradley27 commented 5 months ago

Many thanks for this!

That is very helpful.

Just a very quick minor note, it would also be helpful if there was a column for arrow/feather formatted data for the table in the 'plots generated' section of the README

Many thanks again! Thomas

wdecoster commented 5 months ago

Hmm, no, that wouldn't be accurate. An arrow format is essentially the dataframe of features, and different plots can be generated depending on how the file was created.

TBradley27 commented 5 months ago

Thanks, that makes sense

I generated an arrow formatted file from a sorted bam file. When I ran the arrow formatted file through NanoPlot using --feather I was returned a report that didn't include plots relating to read quality scores or to mapping quality scores - which is different behaviour to when I passed the sorted bam file directly to NanoPlot using --bam

wdecoster commented 5 months ago

Yes, that is as expected. In my opinion, read quality scores are less informative than sequence identity scores. Therefore, cramino doesn't extract/calculate them, and they're not in the arrow file. It is a matter of being efficient. If you care a lot about mapping quality, you could also use https://github.com/wdecoster/make_arrow

TBradley27 commented 5 months ago

Thanks for that, I will check it out. As the original issue has been fixed, I am happy for this issue to be closed