Closed JohannesUniUlm closed 11 months ago
..... I am afraid, I cannot upload the file. Why doesn't GitHub take .tgz files?
@JohannesUniUlm: Thanks your report. Here is the upload in question:
It seems to be a combination of at least two things:
I will have a look at this and report back to you. Once a solution is provided, I will update our beta deployment. You can then proceed with the upload there. The data, datasets and DOIs from the beta are all valid and will be available in the production version later as well.
Dear @JohannesUniUlm,
I have now deployed a fix into our beta site at: https://nomad-lab.eu/prod/v1/staging/gui/ This fixes the problem in displaying the mainfile name and the parsing issue with your VASP data. Could you try it out and report back here?
Unfortunately, we do not seem to be properly handling the nomad.json/nomad.yaml files, as the coauthors and datasets do not get correctly updated. Comments and references do seem to work. I will make this into a separate issue. While we work on this, I would suggest using the GUI to add the datasets and coauthors to your upload:
Dear Lauri,
the mainfile of my upload is now correctly recognized by the beta site. The upload also appears at the release version which displays "Process status: Success" for the same and all data is correctly extracted.
How shall I proceed with my uploads?
Thank you very much!
Best, Johannes
Now you can 1) use the frontend to browse the processed data, and 2) use provided API to query the processed data to further be fed into subsequent data analyses.
@JohannesUniUlm: Here might be some steps to consider:
Dear Lauri,
thank you very much for your help. It seems like my data is now correctly processed by the beta version. Now I am facing a problem concering the amount of data. I would like to
i) upload several thousand individual calculations stored in individual .tgz files ii) assign dataset and other meta to all files iii) publish the files alltogether.
I am obviously not going to do the upload manually. I used a bash script with curl, but I get an error message after the 10th file since this is the maximum amount of different unpublished uploads I am allowed to have. What would you suggest? Can't I use curl to add several files to one upload?
Best, Johannes
Dear Johannes,
You can bundle several files together by compressing them as .zip or .tar files. This will also help with the upload process, as the upload size will be smaller. E.g. to zip single/multiple folders/files, run the following command:
zip -r <zip_filename> <filepath1> <filepath2>
Then you can upload the resulting zip file using the curl
command. Our processing will automatically unzip the contents to your upload. There are also ways to add files to an existing upload, you can see more details in our API documentation.
The size of a single upload is limited to 32 GB. Maybe you can even fit all of your calculations into a single .zip file? If not, you can break it into 32 GB .zip files and upload them individually. If you still hit the limit of 10 unpublished uploads (meaning that you have more than 320 GB of data to upload), it is possible to publish some of the earlier uploads, but we should in this case probably discuss an alternative solution for the transfer.
Deart Lauri,
alright, thank you. Yes, I understood that I can bundle all files together and then use curl, I just hoped that there is some way to upload individual files into one upload using curl as well. This would have just fit better into my local folder structure.
Thanks anyways!
Best,
Johannes
Von: Lauri Himanen @.> Gesendet: Dienstag, 25. Juli 2023 07:45 An: nomad-coe/nomad @.> Cc: JohannesUniUlm @.>; Mention @.> Betreff: Re: [nomad-coe/nomad] Problems in file processing (Issue #78)
Dear Johannes,
You can bundle several files together by compressing them as .zip or .tar files. This will also help with the upload process, as the upload size will be smaller. E.g. to zip single/multiple folders/files, run the following command:
zip -r
Then you can upload the resulting zip file using the curl command. Our processing will automatically unzip the contents to your upload. There are also ways to add files to an existing upload, you can see more details in our API documentation https://nomad-lab.eu/prod/v1/staging/api/v1/extensions/docs#/uploads .
The size of a single upload is limited to 32 GB. Maybe you can even fit all of your calculations into a single .zip file? If not, you can break it into 32 GB .zip files and upload them individually. If you still hit the limit of 10 unpublished uploads (meaning that you have more than 320 GB of data to upload), it is possible to publish some of the earlier uploads, but we should in this case probably discuss an alternative solution for the transfer.
— Reply to this email directly, view it on GitHub https://github.com/nomad-coe/nomad/issues/78#issuecomment-1649159564 , or unsubscribe https://github.com/notifications/unsubscribe-auth/BA4XINEMU5SQPSB6MXNQBPLXR5MO5ANCNFSM6AAAAAA2PX2NCU . You are receiving this because you were mentioned.Message ID: @.***>
Dear Johannes,
There is a way: you can upload individual files to a specific existing upload using the /uploads/{upload_id}/raw/{path}
PUT
API endpoint. You can find the documentation here.
Dear NOMAD team,
I encountered several problems with the attached test upload:
I have been trying to solve this issue for quite a while now but I have not succeeded. Do you have any advise? I would like to upload several thousand calculations soon, which all have similar names, folder structure, etc. as the test upload.
Thank you very much for your help!
Best,
Johannes