Closed apaala closed 4 years ago
This is blocked by the upload to google cloud functionality not working. Log file processing has been implemented. Unable to test start to finish.
@jorvis
Here is the command I am using, maybe it will work for you: /usr/local/common/Python-3.7.2/bin/python3 /local/projects-t2/achatterjee/analytics/cron_uploader/nemo_upload_crawler.py -ilb /local/scratch/achatterjee/NEMO/Converter/2IN/ -ob /local/scratch/achatterjee/NEMO/Converter/2Out/
The error I get when I try to do the upload: INFO: Uploading these files to the cloud bucket: /local/scratch/achatterjee/NEMO/Converter/2Out/6d6a645f-4493-4115-b7a2-833b4caa26fb.h5ad, /local/scratch/achatterjee/NEMO/Converter/2Out//6d6a645f-4493-4115-b7a2-833b4caa26fb.json ERROR: Failed to process file:/local/scratch/achatterjee/NEMO/BrainSpanBulkDevo.tar.gz 2:11 the log file lives here: /local/projects-t3/NEMO/cron_upload_log/
The cron shell script that calls the uploader is in place here /local/projects-t3/NEMO/cron_upload_log/. We need to decide where the output files (h5ad) will be stored before uploading to server.
@jorvis I am not sure if this is ready to be put in place yet, are we in a position to set up the cron? @carlocolantuoni was asking about some datasets he needs uploaded and I am trying to figure out if I should just run it manually or wait for the cron...
As long as logging is in place where we can undo operations as needed, it's fine to install the cron.
@jorvis ok I will talk to @victor73 about getting it in place.
@carlocolantuoni I have set up the cron, it will run at 2:30 am everyday. I will check the logs to see if it worked alright tomorrow and update
great - thanks!
On Thu, Apr 2, 2020 at 2:57 PM apaala notifications@github.com wrote:
@carlocolantuoni https://github.com/carlocolantuoni I have set up the cron, it will run at 2:30 am everyday. I will check the logs to see if it worked alright tomorrow and update
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/90#issuecomment-608043790, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7UG45R6ZZPDAODVZVLRKTNZHANCNFSM4KP275AQ .
-- Carlo
Got an error when it tried to run the command setup in cron:
Traceback (most recent call last): File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2656, in get_loc return self._engine.get_loc(key) File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 'Type'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/local/projects-t3/NEMO/bin/cron/analytics/cron_uploader/nemo_upload_crawler.py", line 311, in
@jorvis @carlocolantuoni Got this error. Not sure what the problem was... Looks like it was not able to find a header in the file but the file seems to have it.
The cron error was:
/local/projects-t3/NEMO/bin/cron/uploader.sh: line 3: gcloud: command not found
Traceback (most recent call last):
File "/local/projects-t3/NEMO/bin/cron/analytics/cron_uploader/nemo_upload_crawler.py", line 37, in
@jorvis Resolved the first error by adding *diff.log as the identifying string. There was a log file that was of a different format that threw the code off... Now the processing stops due to not being able to find a file as it may have been moved.
INFO: Extracting dataset at path: /local/projects-t3/NEMO/dmz/brain/biccn/grant/devhu/transcriptome/scell/processed/counts/GW22_somato2/GW22_somato2.mex.tar.gz
Traceback (most recent call last):
File "/local/projects-t3/NEMO/bin/cron/analytics/cron_uploader/nemo_upload_crawler.py", line 311, in
I didnt move any files ... don't know how that happened ... ?
On Fri, Apr 3, 2020, 10:30 apaala notifications@github.com wrote:
@jorvis https://github.com/jorvis Resolved the first error by adding *diff.log as the identifying string. There was a log file that was of a different format that threw the code off... Now the processing stops due to not being able to find a file as it may have been moved.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/90#issuecomment-608465427, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7U43LMRZGW4PC4VFT3RKXXG7ANCNFSM4KP275AQ .
@carlocolantuoni it may not be your files that got moved... We are trying to process all datasets through the cron. Once that happens it will only process new datasets added.
Hey apaala, could u let me kno wen the chron uploader is working, thnx
The cron error was:
/local/projects-t3/NEMO/bin/cron/uploader.sh: line 3: gcloud: command not found Traceback (most recent call last): File "/local/projects-t3/NEMO/bin/cron/analytics/cron_uploader/nemo_upload_crawler.py", line 37, in from gear.dataarchive import DataArchive File "/local/projects-t3/NEMO/bin/cron/gEAR/lib/gear/dataarchive.py", line 5, in import scanpy.api as sc File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/scanpy/init.py", line 31, in from . import tools as tl File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/scanpy/tools/init.py", line 12, in from ._sim import sim File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/scanpy/tools/_sim.py", line 19, in from .. import readwrite File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/scanpy/readwrite.py", line 9, in import tables File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/tables/init.py", line 93, in from .utilsextension import ( ImportError: /lib64/libstdc++.so.6: version `CXXABI_1.3.9' not found (required by /usr/local/common/Python-3.7.2/lib/python3.7/site-packages/tables/utilsextension.cpython-37m-x86_64-linux-gnu.so)
@adkinsrs This error is persisting, @victor73 said it seems like an environment issue.
Was talking to @apaala about this ticket and #85 and given that the ingest scripts will be updated soon, it may be best to use the .diff files that are generated in the "dmz" area instead of the bundling output, since this file will make note of any files that may have moved in addition to newly bundled files.
Currently I have not heard any additional plans to upload NeMO Archive data. We are still only uploading GEO submission data. So closing for now and will reopen if needed.
Set up cron job once it is confirmed that we are good to go and the log file component is ready