Open ChaeheeLee opened 1 month ago
Hi, I believe your sequence lengths or at least one of them is very short (a lot less than 200bp) and therefore the default value of P will be too large. P is used to refine the far ends of the sequence so can be relatively small.
You are getting the error as P * l > m/3 (where P is the number of blocks to refine, l is the block length and m is your sequence length).
By default l is the square root of the length of the sequence so to avoid the error you are getting, please attempt to reduce P.
If this does not work for you, please send over your input file so I can take a look at it.
Thanks Lorraine
Hi Lorraine,
Thank you so much for your prompt reply! Based you your answer, I figured that there were some very short sequences included.
Please let me try after excluding those and give you an update.
By the way, one more question I have is if there is any way that I can speed up the analysis other than "-m 0" setting?
Thank you!
Best, Chaehee
Hi, within MARS -m 0 is one way to speed up the analysis as well as increasing q but these will all impact the accuracy of the results. There may be some external pre-processing that can be done if you have many sequences. For example if you cluster the sequences you may find that some groups of sequences have a very high similarity score. In this case you may choose to only pick one sequence from these groups to attempt to rotate with the remaining sequences and can assume the others in the same group will have a similar rotation point. This of course needs to be verified from your side.
Hi Lorraine,
Thank you for your suggestions. I will work it around to find the best strategy! :)
Best, Chaehee
Hello, First of all, thank you for the development and continuous maintenance of this very useful tool!
The issue I have right now is that MARS quits at the beginning or in the middle of the analysis with the error message "Error: P is too large!". Some analyses with a small number of sequences run fine, but I guess when I have a large number of sequences (for example, like more than 10,000 sequences in ~200 bp) in the fasta file, I get this error.
I tried to run in very default parameters:
Could you please give me any suggestions to solve this issue?
Thank you very much in advance.