Closed Roleren closed 3 weeks ago
Thanks for the suggestions. While I do not have immediate plans to support this functionality, a PR is always welcome.
I have already forked the repo, I think I see how it can be done.
Your final processing before ORF prediction is this:
merge_read_lengths()
It claims to return:
Returns
-------
merged_alignments: dict(dict)
alignments by merging all lengths
I see the dict is structured like this:
count = alignments[length][strand][(chrom, pos)]
So it is a dict, by length -> strand -> then a double key on chrome and pos.
In the final return, you remove the length key and merge the results of all lengths.
Resulting in this dict structure:
merged_alignments[strand][(chrom, pos_shifted)]
So if I understand correctly, I would need to convert the bigwig (which is split on strand) For each value in bigwig per chromosome, input the shifted read position, what you call (pos_shifted)
I then could call the main detection function: export_orf_coverages() using this dict I made from the bigwig. This is possible, and of course then the metagene plots are skiped, since it is presumed the user already did this correctly.
I think it sounds good, what do you think? Would you be interested to put it as a separate function in the cli if I make a PR, so that people could easily call if from directly from bigwig?
Would make your already quite fast tool, finish in no time.
Your understanding is correct. I will just elaborate it below.
merge_read_lengths(alignments, psite_offsets = dict([read_length, 0] for read_length in alignments.keys()))
In principle, all that needs to be modified is a helper function that enables bigwig -> alignments
dict conversion. Does this make sense?
Cosing due to inactivity.
Is your feature request related to a problem? Please describe. This tool can not input already shifted data, makes it less useful for the community, since sometimes large bigwig files can be used, which are not possible to fit in a bam file.
Describe the solution you'd like optional input for bigwig (ORFquant supports this with very little tinkering already)
Describe alternatives you've considered One alternative is either to use another tool, or I can also just hack a solution by creating a fork that skips to the step where you have the alignments as p sites
Additional context For very large Ribo-seq files (What would have been a > 1 TB bam file etc) With bigwig, it can be loaded basically instantly.