lpantano / seqbuster

pipeline for the analysis of small RNA data
14 stars 3 forks source link

miraligner error #23

Open palbioinfor opened 6 years ago

palbioinfor commented 6 years ago

I have 48 files to align among which 38 files were aligned. Although I did not get the alignment statistics with every file, but .mirna files showed the aligned sequences. I have no idea why I am getting below error with rest 10 fastq files -

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: begin 79, end 82, length 79 at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3107) at java.base/java.lang.String.substring(String.java:1873) at miraligner.map.readseq(map.java:285) at miraligner.Main.main(Main.java:86)

These 10 fastq files are not even that huge (some are smaller than the aligned ones). Also, I noticed other java out of bounds error with aligned sequences too but still they were aligned. How can I get these 10 files done with alignment without this error? With these files I am getting empty '.mirna' as well as '.nomap' files.

p.s. I checked same codes after using an empty 'Mirna.str' file and interestingly I got the alignment statistics with these 10 files too, but output.mirna file was blank and .nomap file was blank too.

lpantano commented 6 years ago

Hi,

sorry about the issue and thanks for reporting.

Can you tell me what species are you working with and how did you prepare the hairpin.fa file?

That file needs to have two lines per precursor, one is the name and the other is the sequence. Some of them have three lines if u downlod from mirbase directly. Sorry if this is something you already did, I am trying to think what it could be.

The last option is to send me one of the file that is giving the error together with the hairpin.fa and the miRNA.str, and I would be happy yo help.

Thanks!

palbioinfor commented 6 years ago

Hi, I am working on Homo sapiens . I have miRNA annotation file which is in -house produced (not from the miRBase). It looks like this-

hsa-mir-551a GGUGACCCUGGAAAUCCAGAGUGGGUGGGGCCAGUCUGACCGUUUCUAGGCGACCCACUC UUGGUUUCCAGGGUUGCCC hsa-mir-34a AGUGUUUCUUUGGCAGUGUCUUAGCUGGUUGUUGUGAGCAAUAGUAAGGAAGCAAUCAGC AAGUAUACUGCCCUAGAAGUGCUGC

Although I am using the .str file from miRbase but 38 samples are done . I do not understand the problems with others. Also I tried to not take substitution into account with a couple of samples (i.e sub=0) then I was able to align them. But I am not sure if I can or should skip the substitution?

Also, is there any way to calculate the percentage of isomiRs with respect to mature miRNAs ?

lpantano commented 6 years ago

Hi,

Can you make sure the fasta file has only one line wit the sequence, something like this:

hsa-mir-551a GGUGACCCUGGAAAUCCAGAGUGGGUGGGGCCAGUCUGACCGUUUCUAGGCGACCCACUCUUGGUUUCCAGGGUUGCCC hsa-mir-34a AGUGUUUCUUUGGCAGUGUCUUAGCUGGUUGUUGUGAGCAAUAGUAAGGAAGCAAUCAGCAAGUAUACUGCCCUAGAAGUGCUGC

As well, if the hairpin.fa is not from mirbase but miRNA.str is from mirbase, you need to make sure all the coordinates in miRNA.str are inside the sequence you has in the the previous file, if not there will be errors.

You can work with isomiRs BioC package and load your samples there. Then you can use isoCounts(ids, ref=TRUE): http://lpantano.github.io/isomiRs/reference/isoCounts.html http://lpantano.github.io/isomiRs/reference/isoCounts.html to get two lines per miRNA, one for reference and another for isomirs, with some tidyverse code you can get that calculation. I’ll open an issue to add this feature, but won’t happen until next week.

Thanks for the ideas, and I hope this helps.

On Jul 2, 2018, at 11:27 AM, palbioinfor notifications@github.com wrote:

Hi, I am working on Homo sapiens . I have miRNA annotation file which is in -house produced (not from the miRBase). It looks like this-

hsa-mir-551a GGUGACCCUGGAAAUCCAGAGUGGGUGGGGCCAGUCUGACCGUUUCUAGGCGACCCACUC UUGGUUUCCAGGGUUGCCC hsa-mir-34a AGUGUUUCUUUGGCAGUGUCUUAGCUGGUUGUUGUGAGCAAUAGUAAGGAAGCAAUCAGC AAGUAUACUGCCCUAGAAGUGCUGC

Although I am using the .str file from miRbase but 38 samples are done . I do not understand the problems with others. Also I tried to not take substitution into account with a couple of samples (i.e sub=0) then I was able to align them. But I am not sure if I can or should skip the substitution?

Also, is there any way to calculate the percentage of isomiRs with respect to mature miRNAs ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lpantano/seqbuster/issues/23#issuecomment-401842838, or mute the thread https://github.com/notifications/unsubscribe-auth/ABi_HKy4igCPfkosPNRENbnw3nZLFqozks5uCjvagaJpZM4U-KPL.