lorrainea / MARS

MARS: improving Multiple circular sequence Alignment using Refined Sequences
GNU General Public License v3.0
27 stars 9 forks source link

Error: P is too large! #14

Open ChaeheeLee opened 1 month ago

ChaeheeLee commented 1 month ago

Hello, First of all, thank you for the development and continuous maintenance of this very useful tool!

The issue I have right now is that MARS quits at the beginning or in the middle of the analysis with the error message "Error: P is too large!". Some analyses with a small number of sequences run fine, but I guess when I have a large number of sequences (for example, like more than 10,000 sequences in ~200 bp) in the fasta file, I get this error.

I tried to run in very default parameters:

mars -a DNA -in input.fasta -o output.fasta -T 28 with 32G CPU and 128G memories allocated on HPC.

Could you please give me any suggestions to solve this issue?

Thank you very much in advance.

lorrainea commented 4 weeks ago

Hi, I believe your sequence lengths or at least one of them is very short (a lot less than 200bp) and therefore the default value of P will be too large. P is used to refine the far ends of the sequence so can be relatively small.

You are getting the error as P * l > m/3 (where P is the number of blocks to refine, l is the block length and m is your sequence length).

By default l is the square root of the length of the sequence so to avoid the error you are getting, please attempt to reduce P.

If this does not work for you, please send over your input file so I can take a look at it.

Thanks Lorraine

ChaeheeLee commented 3 weeks ago

Hi Lorraine,

Thank you so much for your prompt reply! Based you your answer, I figured that there were some very short sequences included.

Please let me try after excluding those and give you an update.

By the way, one more question I have is if there is any way that I can speed up the analysis other than "-m 0" setting?

Thank you!

Best, Chaehee

lorrainea commented 3 weeks ago

Hi, within MARS -m 0 is one way to speed up the analysis as well as increasing q but these will all impact the accuracy of the results. There may be some external pre-processing that can be done if you have many sequences. For example if you cluster the sequences you may find that some groups of sequences have a very high similarity score. In this case you may choose to only pick one sequence from these groups to attempt to rotate with the remaining sequences and can assume the others in the same group will have a similar rotation point. This of course needs to be verified from your side.

ChaeheeLee commented 3 weeks ago

Hi Lorraine,

Thank you for your suggestions. I will work it around to find the best strategy! :)

Best, Chaehee