mhuttner / miRA

GNU General Public License v2.0
5 stars 1 forks source link

miRA - errors running full or batch #3

Open BriMeireles opened 6 years ago

BriMeireles commented 6 years ago

Hello,

I used miRA (last version) full with four different data sets and only in one I got an error: ERROR: initialize_Lfold: argument must be greater 0. In your gitHub page (https://github.com/mhuttner/miRA) this error is described as related with out-of-memory issue (I have 1 TB of memory in my server). So, this amount of memory that I have available makes me wonder, if somewhere in the compiling process of miRA, there is defined a limit of memory usage for the JVM.

Besides that, I followed your suggestion and ran miRA batch instead, which gave me another error: Error: Could not find or load main class fr.orsay.lri.varna.applications.VARNAcmd. Now, I am trying to solve this problem.

Please, could you confirm the situation with the limit of memory usage of JVM? Thanks

mhuttner commented 6 years ago

Hello, miRA uses slightly modified code from the ViennaRNA package for folding https://github.com/ViennaRNA/ViennaRNA Unfortunately big modifications to this library are out of scope. The folding is done in C++, and as far as i know there is no set memory limit.

Your second problem is because miRA tries to draw the folds using VARNA which is a java programm. miRA requires the Varna binary (https://github.com/mhuttner/miRA/blob/master/VARNAv3-91.jar) in the local folder, did you install it there?

Best Regards, Michael

BriMeireles commented 6 years ago

Yes, I have the VARNA binary in the main folder of miRA. That's why I don't understand the error ·Could not find or load main class fr.orsay.lri.varna.applications.VARNAcmd ".

In fact, when I run miRA in full mode only one run of four crashed (4 different sets of data). And this one crashed due to memory issues. Can you tell me where within the code is called the VARNA jar file? Maybe in that call it is necessary to increase the memory for the JVM.

mhuttner commented 6 years ago

Sure, varna is called in https://github.com/mhuttner/miRA/blob/master/src/reporting.c the function is called “create_structure_image” Feel free to create a PR with inprovements.

Michael

BriMeireles commented 6 years ago

I ran miRa in full mode with the verbose active, to see where it crashed, which was in folding, on the cluster 21707 (from 30158 clusters). It can’t be memory problem, we have 1TB of RAM and I've been looking at the code, and it looks like the problem is here "if (length <1)". From my point of view, this can’t be related to memory.

Regarding batch mode, the tests ran smoothly but in my crashes when I ran my samples. I confirmed that the binary VARNA is in the main folder.

mhuttner commented 6 years ago

If the (length <1) is true that would mean an empty sequence string is passed to the folding function.

Could you look up this cluster in the output file "cluster_contigs.bed", should be line 21707. You could also try removing the other lines from the bed file and then passing it into the folding command miRA fold [-c config file] [-o output file] [-v] this will tell you if this specific cluster is problematic or if it is some global issue.

Michael

BriMeireles commented 6 years ago

I can't understand how it is passing an empty sequence to the folding function, I confirm with the fasta and the bed file: the sequence exists.

I removed the line 21707 from the bed file and then I ran the miRA fold (as you suggested). The run finished without any problem. I also run the miRA fold with a bed file containing only the cluster in the line 21707, and it run without any problem.

I was thinking about to merge the results of the full set with this cluster removed, with the results of the run of this cluster alone, in order to run the last step of the miRA. Can I do so? I'm not sure if there is some information about the other clusters that is necessary for the folding process.

mhuttner commented 6 years ago

Hello, miRA batch essentially does this and splits the file into smaller batches, each containing a single chromosome. If you would merge the file the folding would be correct, only the coverage analysis in the reports would be incorrect, if you merge the .miRA files directly after the fold command and then run miRA coverage everything would be correct.

I can't think of why this bug would occur this way, but we have seen problematic behaviour in the Lfold library before with large inputs. Sorry for the inconvenience.

Michael

BriMeireles commented 6 years ago

Hi, I will do that, merging the two files before run the miRA coverage.

Thanks for your help.