Closed multimeric closed 1 year ago
Hi Michael,
You can find below an example of the required directory structure. Thanks for pointing it out. I will add it to the documentation.
If indeed this issue is due to the .command.run size, then I have an idea and will implement it in the next few days.
├── mgnify
│ └── mgy_clusters_2018_12.fa
├── alphafold_params_2022-03-02
│ ├── LICENSE
│ ├── params_model_1_multimer.npz
│ ├── params_model_1_multimer_v2.npz
│ ├── params_model_1.npz
│ ├── params_model_1_ptm.npz
│ ├── params_model_2_multimer.npz
│ ├── params_model_2_multimer_v2.npz
│ ├── params_model_2.npz
│ ├── params_model_2_ptm.npz
│ ├── params_model_3_multimer.npz
│ ├── params_model_3_multimer_v2.npz
│ ├── params_model_3.npz
│ ├── params_model_3_ptm.npz
│ ├── params_model_4_multimer.npz
│ ├── params_model_4_multimer_v2.npz
│ ├── params_model_4.npz
│ ├── params_model_4_ptm.npz
│ ├── params_model_5_multimer.npz
│ ├── params_model_5_multimer_v2.npz
│ ├── params_model_5.npz
│ └── params_model_5_ptm.npz
├── pdb70
│ └── pdb70_from_mmcif_200916
│ ├── md5sum
│ ├── pdb70_a3m.ffdata
│ ├── pdb70_a3m.ffindex
│ ├── pdb70_clu.tsv
│ ├── pdb70_cs219.ffdata
│ ├── pdb70_cs219.ffindex
│ ├── pdb70_hhm.ffdata
│ ├── pdb70_hhm.ffindex
│ └── pdb_filter.dat
├── pdb_mmcif
│ ├── mmcif_files
│ │ ├── 1g6g.cif
│ │ ├── 1go4.cif
│ │ ├── 1isn.cif
│ │ ├── 1kuu.cif
│ │ ├── 1m7s.cif
│ │ ├── 1mwq.cif
│ │ ├── 1ni5.cif
│ │ ├── 1qgd.cif
│ │ ├── 1tp9.cif
│ │ ├── 1wa9.cif
│ │ ├── 1ye5.cif
│ │ ├── 1yhl.cif
│ │ ├── 2bjd.cif
│ │ ├── 2bo9.cif
│ │ ├── 2e7t.cif
│ │ ├── 2fyg.cif
│ │ ├── 2j0q.cif
│ │ ├── 2jcq.cif
│ │ ├── 2m4k.cif
│ │ ├── 2n9o.cif
│ │ ├── 2nsx.cif
│ │ ├── 2w4u.cif
│ │ ├── 2wd6.cif
│ │ ├── 2wh5.cif
│ │ ├── 2wji.cif
│ │ ├── 2yu3.cif
│ │ ├── 3cw2.cif
│ │ ├── 3d45.cif
│ │ ├── 3gnz.cif
│ │ ├── 3j0a.cif
│ │ ├── 3jaj.cif
│ │ ├── 3mzo.cif
│ │ ├── 3nrn.cif
│ │ ├── 3piv.cif
│ │ ├── 3pof.cif
│ │ ├── 3pvd.cif
│ │ ├── 3q45.cif
│ │ ├── 3qh6.cif
│ │ ├── 3rg2.cif
│ │ ├── 3sxe.cif
│ │ ├── 3uai.cif
│ │ ├── 3uid.cif
│ │ ├── 3wae.cif
│ │ ├── 3wt1.cif
│ │ ├── 3wtr.cif
│ │ ├── 3wy2.cif
│ │ ├── 3zud.cif
│ │ ├── 4bix.cif
│ │ ├── 4bzx.cif
│ │ ├── 4c1n.cif
│ │ ├── 4cej.cif
│ │ ├── 4chm.cif
│ │ ├── 4fzo.cif
│ │ ├── 4i1f.cif
│ │ ├── 4ioa.cif
│ │ ├── 4j6o.cif
│ │ ├── 4m9q.cif
│ │ ├── 4mal.cif
│ │ ├── 4nhe.cif
│ │ ├── 4o2w.cif
│ │ ├── 4pzo.cif
│ │ ├── 4qlx.cif
│ │ ├── 4uex.cif
│ │ ├── 4zm4.cif
│ │ ├── 4zv1.cif
│ │ ├── 5aj4.cif
│ │ ├── 5frs.cif
│ │ ├── 5hwo.cif
│ │ ├── 5kbk.cif
│ │ ├── 5odq.cif
│ │ ├── 5u5t.cif
│ │ ├── 5wzq.cif
│ │ ├── 5x9z.cif
│ │ ├── 5xe5.cif
│ │ ├── 5ynv.cif
│ │ ├── 5yud.cif
│ │ ├── 5z5c.cif
│ │ ├── 5zb3.cif
│ │ ├── 5zlg.cif
│ │ ├── 6a6i.cif
│ │ ├── 6az3.cif
│ │ ├── 6ban.cif
│ │ ├── 6g1f.cif
│ │ ├── 6ix4.cif
│ │ ├── 6jwp.cif
│ │ ├── 6ng9.cif
│ │ ├── 6ojj.cif
│ │ ├── 6s0x.cif
│ │ ├── 6sg9.cif
│ │ ├── 6vi4.cif
│ │ └── 7sp5.cif
│ └── obsolete.dat
├── pdb_seqres
│ └── pdb_seqres.txt
├── small_bfd
│ └── bfd-first_non_consensus_sequences.fasta
├── uniclust30
│ └── uniclust30_2018_08
│ ├── uniclust30_2018_08_a3m_db -> uniclust30_2018_08_a3m.ffdata
│ ├── uniclust30_2018_08_a3m_db.index
│ ├── uniclust30_2018_08_a3m.ffdata
│ ├── uniclust30_2018_08_a3m.ffindex
│ ├── uniclust30_2018_08.cs219
│ ├── uniclust30_2018_08_cs219.ffdata
│ ├── uniclust30_2018_08_cs219.ffindex
│ ├── uniclust30_2018_08.cs219.sizes
│ ├── uniclust30_2018_08_hhm_db -> uniclust30_2018_08_hhm.ffdata
│ ├── uniclust30_2018_08_hhm_db.index
│ ├── uniclust30_2018_08_hhm.ffdata
│ ├── uniclust30_2018_08_hhm.ffindex
│ └── uniclust30_2018_08_md5sum
├── uniprot
│ └── uniprot.fasta
└── uniref90
└── uniref90.fasta
Yeah so your directory seems to have the same structure, it's just that we have thousands of mmCIF files. I'm surprised that you don't?
$ ls pdb_mmcif/mmcif_files | head
100d.cif
101d.cif
101m.cif
102d.cif
102l.cif
102m.cif
103d.cif
103l.cif
103m.cif
104d.cif
$ ls pdb_mmcif/mmcif_files | wc -l
183793
Considering this, I think finding a solution to the symlinking issue would be ideal.
Also, might it be possible to document the AlphaFold version that is being used for each pipeline release? Because we have several versions of AlphaFold installed with different databases, and we need to know which version should be used with proteinfold.
I do have too. What I pasted is a reduced version of the databases I use for testing in order for you to see the structure.
Let me know whether #89 works in order to close this issue.
I assume that the fix worked so I close the issue. Please feel free to re-open it in case it didn't work.
Hi @athbaltzis, sorry for the late reply.
It doesn't look like this fix worked. I re-ran the pipeline, and it failed with sbatch: error: Batch job submission failed: Pathname of a file, directory or other parameter too long
.
I've attached nextflow's output below:
I've also attached the submit script that demonstrates this behaviour. .command.run.txt
Hi @multimeric this should be fixed in the most recent edge version of Nextflow (23.05.0-edge), find here the corresponding issue. So maybe you can give it a try by updating Nextflow or adding to your command NXF_VER='23.05.0-edge' nextflow run ...
Let us know if this works for you
I guess the rocket means it worked, will close again the issue then
Description of the bug
We want to run proteinfold on our cluster, where we already have the AlphaFold data. However, using passing the
--alphafold2_db
flag to point to this data, proteinfold tries to symlink in the thousands of files located in that directory tree. This causes obvious issues.Detailed Explanation
Command output: sbatch: error: Batch job submission failed: Pathname of a file, directory or other parameter too long
, because the excessive symlinking resulting in a 20Mb sbatch script. On other environments it would manifest differentlyCommand used and terminal output
Relevant files
The
.command.run
file that causes the issues: command.zipSystem information
proteinfold
1.0.0