liampshaw / mobile-gene-regions

Analysing the genomic context of mobile genes
4 stars 0 forks source link

How to realize different length interception for upstream and downstream #15

Open Dx-wmc opened 1 month ago

Dx-wmc commented 1 month ago

Hi, thank you for developing this excellent software. While using it, I encountered a few issues and would like to suggest some improvements to help optimize and refine the software's functionality.

  1. Allow separate settings for upstream and downstream extraction lengths: I checked the code of analyze-flanking-regions.py, and I found that the original intention was that the script had set upstream and downstream lengths, but there were certain problems with the script (--flanking_region). Although I tried to modify it (Adding upstream and downstream parameters), the final plot result did not change. So I hope you can modify this part.
  2. Automatically detect and adjust extraction lengths: It would be helpful if the software could automatically detect the actual available length of the target region before extracting sequences and dynamically adjust the extraction length based on the actual situation to avoid errors caused by exceeding sequence boundaries.
  3. Allow extraction lengths shorter than the set threshold: Due to the characteristics of next-generation sequencing technology, the lengths of upstream and downstream sequences may vary among different genes, and some may not even meet the fixed length set by the user. In such cases, if the software could dynamically adjust the upstream and downstream extraction lengths based on the actual available sequence length (i.e., allow extraction lengths shorter than the set threshold, see image), it would greatly improve the flexibility and accuracy of data processing. image
liampshaw commented 1 month ago

Thanks for trying out the software and for taking the time to suggest these helpful improvements.

  1. I agree. Something I intended originally but didn't get round to adding. Hopefully this should be the most straightforward of your suggestions.
  2. I will have to think about how best to implement.
  3. I see the argument for having this and experimented with this option (the default config includes a --complete flag but I think I didn't propagate this through the current code). The problem if one allows shorter sequences is that it causes breaks in homology across the dataset that are in fact due to sequencing. In your example, the left-hand blue block in the middle sequence would be another colour (e.g. yellow) and the corresponding stretch in the other sequences would also become yellow. That can be dealt with downstream but it makes the visualizations harder to interpret. However, sounds like I should allow it as an option with an appropriate warning.

Again, thank you for these comments! I really appreciate the time you've taken. I'm busy over the next couple of weeks but will do my best to implement these by the end of May.

Dx-wmc commented 1 month ago

Thank you for your consideration and I hope to see the software get better in the near future.