jaswindersingh2 / SPOT-RNA

RNA Secondary Structure Prediction using an Ensemble of Two-dimensional Deep Neural Networks and Transfer Learning.
https://apisz.sparks-lab.org:8443/spot-rna-sz.html
Mozilla Public License 2.0
95 stars 33 forks source link

tensorflow.Example exceeds maximum protobuf size of 2GB #5

Closed wososa closed 4 years ago

wososa commented 4 years ago

Dear developers of SPOT-RNA,

I am excited to run this SPOT-RNA tool. I found this error tensorflow.Example exceeds maximum protobuf size of 2GB. Is there a length limit for the input sequence? I am using CPU instead of GPU. Could you please help me?

Also, how do I set up the number of threads (CPUs)?

Thanks, Woody

jaswindersingh2 commented 4 years ago

Hi Woody,

It looks like your input sequence is very long (>6,000 nts) or maybe the batch size of input sequences is large. if possible, can you please tell me the length of the input sequence or if using batch input then the number of sequences in the batch with maximum sequence length in the batch?

We tested SPOT-RNA to a maximum sequence length of 6,000 nts (single sequence not in batch) and it works alright without any issue. We trained SPOT-RNA up to a maximum sequence length of 500 and compared with other predictors up to a sequence length of 1500 in the paper. We found that SPOT-RNA is better than the other predictors when the sequence length is less than 500 because it is trained to the max. 500 sequence length.

If you want a prediction for the longer sequence it better to search for thermodynamically stable regions/motifs in the long RNA sequence and make a prediction for those small regions/motifs. As far as I know, it is unlikely that a long full-length sequence has a stable secondary structure. You can also check the RNAplfold predictor (available at https://www.tbi.univie.ac.at/RNA/RNAplfold.1.html) which predicts locally stable secondary structure from a long RNA sequence and then combine them. A similar approach can also be used for the prediction from SPOT-RNA for a very long sequence.

Regarding setting up the number of threads, now a flag '"--cpu xx" is added to specify the number of threads allowed to be used by SPOT-RNA.

Jaswinder

wososa commented 4 years ago

@jaswindersingh2 ,

Thanks for the detailed explanations. I will limit my input sequence to be shorter than 500bp. Thanks for adding the --cpu parameter!

Best, Woody