huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.79k stars 26.96k forks source link

Segmentation fault when importing ESMFold and Tokenizers from transformers along with Pyrosetta #28797

Closed SIAndersson closed 8 months ago

SIAndersson commented 9 months ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

import pyrosetta
from transformers import AutoTokenizer, EsmForProteinFolding

Expected behavior

Expected behaviour: the module imports without issue. I can import pyrosetta on its own without issue. I can import the transformers modules without issue and run inference on PDB modules, as described in the protein structure prediction Jupyter notebook. I can do this without issue in a separate script. It is only when I import both that the segmentation fault occurs. The import order does not matter. Given that both work separately, I would expect them to work together as I cannot find any package conflicts.

ArthurZucker commented 9 months ago

cc @Rocketknight1 maybe we can reproduce?

Rocketknight1 commented 9 months ago

Hi @SIAndersson this is quite an unusual bug! We'll see what we can figure out - in the meantime, if you have any other machines you can test on, can you try it there?

Also, the EsmForProteinFolding model in Transformers doesn't import anything unusual, so it should behave like any other model in the library. To help figure out the issue, can you try:

1) Importing AutoTokenizer and EsmForProteinFolding separately to see which causes the issue 2) If the issue is EsmForProteinFolding, can you try importing another language model class like BertForSequenceClassification and let me know if the same issue occurs?

SIAndersson commented 9 months ago

@Rocketknight1 Hi, thank you for the fast reply! I have tried importing both EsmForProteinFolding and AutoTokenizer separately, and changing which one I import first, but both result in the segmentation fault. I tried importingBertForSequenceClassification as well and it resulted in the same issue. Very strange issue! I even tried uninstalling and reinstalling both Transformers and PyRosetta to see if it would solve the issue, but it persists.

Rocketknight1 commented 9 months ago

Hi @SIAndersson, that's annoying! I tried, but unfortunately I can't actually get access to pyrosetta to reproduce the issue here - Hugging Face isn't an academic institution, so I can't get a free licence.

As a workaround, maybe you could run ESM and save the outputs, and then load them in another Python process to handle them with pyrosetta? I realize that's not very convenient, but I'm not sure what else to try because I'm kind of stuck when it comes to diagnosing the problem.

SIAndersson commented 9 months ago

@Rocketknight1 Ah, that's unfortunate!

I tried calling the model from a separate script instead of directly in the code and it worked without issue. It is a bit slower, but as long as it works, it's not a huge issue. Thank you for the help!

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.