ranwez / MACSE_V2_PIPELINES

This repository provides source code for several pipelines dedicated to the alignment of nucleotide coding sequences that are based on MACSE. These pipelines are mostly bash scripts encapsulated within singularity containers and sometimes combined in nextflow workflows.
https://bioweb.supagro.inra.fr/macse/
31 stars 9 forks source link

java.lang.StringIndexOutOfBoundsException: String index out of range: #12

Closed josieparis closed 2 weeks ago

josieparis commented 4 weeks ago

Hi everyone,

I'm having a sticky Java issue that I can't seem to fix running the command line version of MACSE (macse_v2.07.jar). I hope this is the correct place to seek guidance on the issue!

The code I am running:

java -jar macse_v2.07.jar -prog exportAlignment -align input.fasta -codonForFinalStop --- -codonForInternalStop NNN

I'm running this code on > 1000 fasta files. For some, the software runs just fine and I get the expected NT and AA records as output. However, for about 30% of the fasta files, I get the Java Error:

java.lang.StringIndexOutOfBoundsException: String index out of range: XXXX

I've been running the program on two different fasta files to see if I can figure out the issue, but no luck.

I also ran the code with the -debug option, so the full output is:

java.lang.StringIndexOutOfBoundsException: String index out of range: 1181
    at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:47)
    at java.base/java.lang.String.charAt(String.java:693)
    at sequences.GapsRestriction.completeBuilders(GapsRestriction.java:81)
    at sequences.GapsRestriction.computeBuilders(GapsRestriction.java:55)
    at sequences.GapsRestriction.computeRestrictedSequences(GapsRestriction.java:21)
    at sequences.AbstractSeqSet.computeGapsRestriction(AbstractSeqSet.java:203)
    at programs.export.ExportAlignment.execute(ExportAlignment.java:157)
    at cli.CLI_program.parse(CLI_program.java:110)
    at cli.CLI_api.parse(CLI_api.java:234)
    at main.MacseMain.parse(MacseMain.java:111)
    at main.MacseMain.filterCommands(MacseMain.java:54)
    at main.MacseMain.main(MacseMain.java:203)

I'm also attaching here two fasta files, one that works fine (OG0006556_cds.fasta) and one that throws the above error (OG0006589_cds.fasta).

Any help greatly appreciated!

Thanks :)

Josie

josieparis commented 4 weeks ago

OG0006556_cds.fasta.txt

OG0006589_cds.fasta.txt

ranwez commented 3 weeks ago

Hi Josie,

Thank you for using MACSE and reaching out here with your question—this is indeed the right place for any inquiries related to MACSE or the MACSE pipeline.

The subprogram you’re attempting to use, “exportAlignment,” is specifically designed to handle alignments as input, as indicated by its name and the “-align” parameter. The files you’re providing are unaligned sequences, which is why you’re encountering an issue (I’m not certain why one file crashes while the other doesn’t, but neither should be accepted since they aren’t aligned).

The primary purpose of the “exportAlignment” subprogram is to convert alignments produced by MACSE, which may contain internal stop codons and frameshifts (denoted by question marks ‘!’), into conventional alignment formats.

However, it seems that your goal is to translate nucleotide sequences into amino acid sequences. For that, you should use the “translateNT2AA” subprogram. I tested your files with the following commands:

java -jar macse_v2.07.jar -prog translateNT2AA -seq OG0006589_cds.fasta
java -jar macse_v2.07.jar -prog translateNT2AA -seq OG0006556_cds.fasta

Both worked successfully.

Please note that the translation is performed using the default genetic code, which is suitable in most cases. However, if you require a different genetic code, you can specify it using the -gc_def option. For example, to use the mitochondrial genetic code, you would use -gc_def 5.

Please let me know if this resolves your issue, and don’t hesitate to ask if you have any further questions.

Best regards,

Vincent

josieparis commented 2 weeks ago

Hi Vincent,

Apologies for my late response, and thank you so much for clarifying the parameter usage (my bad!).

I can confirm that all is working now and I'll close the issue.

Bests,

Josie