steineggerlab / foldcomp

Compressing protein structures effectively with torsion angles
GNU General Public License v3.0
145 stars 14 forks source link

Extracting sequences from Foldcomp database #34

Closed valentynbez closed 1 year ago

valentynbez commented 1 year ago

Hello,

First of all, thank you for amazing library! Is there a way to extract sequences from the database? It is possible to retrieve with Python interface, but it is too slow in my case. CLI doesn't allow me to do it, I receive errors:

 ./foldcomp extract --fasta -t 16 afdb_swissprot_v4 fasta_test        
[Error] processing db entry AF-P80216-F1-model_v4
[Error] processing db entry AF-B7J2B3-F1-model_v4 failed.AF-P02460-F1-model_v4 failed.
[Error] File is not a valid fcz file
AF-Q8LBQ7-F1-model_v4 failed.

Any advice on how to extract sequences more efficiently?

khb7840 commented 1 year ago

I'm looking into the error. Thank you for raising this issue 😄

khb7840 commented 1 year ago

Could you try with the latest version? Because I didn't get the error from your command with afdb_swissprot_v4

valentynbez commented 1 year ago

The previous was a binary for Linux downloaded using wget.

Now I built from latest github source using build.sh. I used it on afdb_swissprot_v4 downloaded from the lab website and ran command: ./foldcomp extract --fasta -t 16 afdb_swissprot_v4 fasta_test Receiving same errors:

[Error] File is not a valid fcz file
[Error] processing db entry AF-Q2S902-F1-model_v4 failed.
[Error] File is not a valid fcz file

cmake 3.26.1 GCC 10.2.0 OpenMP 4.5

milot-mirdita commented 1 year ago

Could you compute md5sums for the various afdb_swissprot_v4* files?

I'll compute the expected md5 sums tomorrow and post them here.

valentynbez commented 1 year ago

Sure!

1cfbec179a06f0fbba0a8f7880713098  afdb_swissprot_v4
740bab4f9ec8808aedb68d6b1281aeb2  afdb_swissprot_v4.dbtype
05e8b8fba2544b88f9188e1eb1131d8a  afdb_swissprot_v4.index
3a1da264a0877bf142e134a6c0f00d4a  afdb_swissprot_v4.lookup
61ffaa1f49241f9ddce58a0d2872dccd  afdb_swissprot_v4.source
khb7840 commented 1 year ago

It seems like the database file is different from the released one. Please check with this file or you may download the db with python api as foldcomp.setup("afdb_swissprot_v4")

00aa35452babd2b20b4df01264eac860  afdb_swissprot_v4
740bab4f9ec8808aedb68d6b1281aeb2  afdb_swissprot_v4.dbtype
05e8b8fba2544b88f9188e1eb1131d8a  afdb_swissprot_v4.index
3a1da264a0877bf142e134a6c0f00d4a  afdb_swissprot_v4.lookup
61ffaa1f49241f9ddce58a0d2872dccd  afdb_swissprot_v4.source
valentynbez commented 1 year ago

Hello, @khb7840! You're right, it looks like something went wrong during the download. Everything works fine now, thanks for the support!