Closed yukang123 closed 10 months ago
Hi, I think you'd better download it from RNAcentral website directly. It offered a new release with more sequences recently. Then you can process it through cd-hit with your own settings.
Thanks! Sorry, I am not familiar with cd-hit. Is cd-hit the pre-processing step you took for pertaining? Where can I find the relevant scripts?
You can try to follow their manual. https://sites.google.com/view/cd-hit
Got it. I will check it. It seems that I just need to replace the T with U and use cd-hit to reduce the redundancy. Am I right?
Hi Guys! I am curious whether I can access the preprocessed dataset RNAcentral100 which was used to pre-train the foundation model. If not, should I directly download the data from RNAcentral website? https://ftp.ebi.ac.uk/pub/databases/RNAcentral/releases/19.0/.
Thanks a lot!