nemoarchive / analytics

Repository for the NeMO Analytics project.
MIT License
1 stars 0 forks source link

Process New DS sent over by Carlo #91

Closed apaala closed 4 years ago

apaala commented 4 years ago

/autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/BrainSpanBulkDevo/

/autofs/encrypted/NEMO/incoming/brain/other/grant/development/Huttner/HuttCtxDevoLMDhs/

apaala commented 4 years ago

@carlocolantuoni Here is the error for one of the datasets:

adata = sc.read("OUT/BrainSpanBulkDevo/BrainSpanBulkDevo_DataMTX.tab", cache=False).transpose() Traceback (most recent call last): File "", line 1, in File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/scanpy/readwrite.py", line 78, in read backup_url=backup_url, cache=cache, **kwargs) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/scanpy/readwrite.py", line 470, in _read adata = read_text(filename, delimiter, first_column_names) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/readwrite/read.py", line 241, in read_text return _read_text(f, delimiter, first_column_names, dtype) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/readwrite/read.py", line 352, in _read_text dtype=dtype) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/base.py", line 666, in init filename=filename, filemode=filemode) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/base.py", line 868, in _init_as_actual self._check_dimensions() File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/base.py", line 1938, in _check_dimensions .format(self._n_vars, self._var.shape[0])) ValueError: Variables annot. var must have number of columns of X (498), but has 996 rows.

apaala commented 4 years ago

The other file did not give me this error... I was able to make a h5ad object out of it. It has not been pushed to the google server yet.

carlocolantuoni commented 4 years ago

it looks like it it detecting twice as many columns (rows once it is transposed) as there should be there. however, there are 498 columns in the DataMTX.tab file when i open it - as there should be. do u see 498 columns/samples when u open the .tab file? any idea why it thinks there are 996? or is there another problem that i am not understanding in these errors?

On Tue, Feb 11, 2020 at 2:50 PM apaala notifications@github.com wrote:

@carlocolantuoni https://github.com/carlocolantuoni Here is the error for one of the datasets:

adata = sc.read("OUT/BrainSpanBulkDevo/BrainSpanBulkDevo_DataMTX.tab", cache=False).transpose() Traceback (most recent call last): File "", line 1, in File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/scanpy/readwrite.py", line 78, in read backup_url=backup_url, cache=cache, *kwargs) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/scanpy/readwrite.py", line 470, in _read adata = read_text(filename, delimiter, first_column_names) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/readwrite/read.py", line 241, in read_text return _read_text(f, delimiter, first_column_names, dtype) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/readwrite/read.py", line 352, in _read_text dtype=dtype) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/base.py", line 666, in init* filename=filename, filemode=filemode) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/base.py", line 868, in _init_as_actual self._check_dimensions() File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/base.py", line 1938, in _check_dimensions .format(self._n_vars, self._var.shape[0])) ValueError: Variables annot. var must have number of columns of X (498), but has 996 rows.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/91?email_source=notifications&email_token=AH7KC7U6JCHNAMMDWJLLU5TRCL6PFA5CNFSM4KQRBPH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELN2DAQ#issuecomment-584819074, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7U3IRLKCR4RFTDQ4K3RCL6PFANCNFSM4KQRBPHQ .

-- Carlo

carlocolantuoni commented 4 years ago

/autofs/encrypted/NEMO/incoming/brain/other/grant/development/AllenBrain/BrainSpanBulkDevo

On Wed, Feb 12, 2020 at 12:32 AM Carlo Colantuoni colantuonicarlo@gmail.com wrote:

it looks like it it detecting twice as many columns (rows once it is transposed) as there should be there. however, there are 498 columns in the DataMTX.tab file when i open it - as there should be. do u see 498 columns/samples when u open the .tab file? any idea why it thinks there are 996? or is there another problem that i am not understanding in these errors?

On Tue, Feb 11, 2020 at 2:50 PM apaala notifications@github.com wrote:

@carlocolantuoni https://github.com/carlocolantuoni Here is the error for one of the datasets:

adata = sc.read("OUT/BrainSpanBulkDevo/BrainSpanBulkDevo_DataMTX.tab", cache=False).transpose() Traceback (most recent call last): File "", line 1, in File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/scanpy/readwrite.py", line 78, in read backup_url=backup_url, cache=cache, *kwargs) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/scanpy/readwrite.py", line 470, in _read adata = read_text(filename, delimiter, first_column_names) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/readwrite/read.py", line 241, in read_text return _read_text(f, delimiter, first_column_names, dtype) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/readwrite/read.py", line 352, in _read_text dtype=dtype) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/base.py", line 666, in init* filename=filename, filemode=filemode) File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/base.py", line 868, in _init_as_actual self._check_dimensions() File "/usr/local/common/Python-3.7.2/lib/python3.7/site-packages/anndata/base.py", line 1938, in _check_dimensions .format(self._n_vars, self._var.shape[0])) ValueError: Variables annot. var must have number of columns of X (498), but has 996 rows.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/91?email_source=notifications&email_token=AH7KC7U6JCHNAMMDWJLLU5TRCL6PFA5CNFSM4KQRBPH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELN2DAQ#issuecomment-584819074, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7U3IRLKCR4RFTDQ4K3RCL6PFANCNFSM4KQRBPHQ .

-- Carlo

-- Carlo

apaala commented 4 years ago

@carlocolantuoni I was able to process the first file too after editing the Metadata and DataMatrix. Brian was right the spaces were throwing it off.

carlocolantuoni commented 4 years ago

Great - should we avoid spaces in colnames in the future?

Or is it better to fix this in the processing script?

Thnx, Carlo

On Wed, Feb 12, 2020, 12:46 apaala notifications@github.com wrote:

@carlocolantuoni https://github.com/carlocolantuoni I was able to process the first file too after editing the Metadata and DataMatrix. Brian was right the spaces were throwing it off.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/91?email_source=notifications&email_token=AH7KC7WONGOM5DHEVRU5PILRCQYXPA5CNFSM4KQRBPH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELRW3MY#issuecomment-585330099, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7RRAKQ5N2WH5NZILTLRCQYXPANCNFSM4KQRBPHQ .

jorvis commented 4 years ago

For now I'd say avoid spaces then, especially given how many dependent libraries there are at work here which might also have issues with them.

carlocolantuoni commented 4 years ago

Ok, but im guessing spaces will come up in other instances as well so we might want to make it robust wen we get a chance.

On Wed, Feb 12, 2020, 14:47 Joshua Orvis notifications@github.com wrote:

For now I'd say avoid spaces then, especially given how many dependent libraries there are at work here which might also have issues with them.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/91?email_source=notifications&email_token=AH7KC7QNX5ZQKWA6MYZKSILRCRG4PA5CNFSM4KQRBPH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELSEHFY#issuecomment-585384855, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7V5D2SIELUV7U6P2UDRCRG4PANCNFSM4KQRBPHQ .

jorvis commented 4 years ago

I agree.

https://github.com/jorvis/gEAR/issues/627

apaala commented 4 years ago

@carlocolantuoni @jorvis

I was able to successfully upload the files to google (I was trying to use the key I made instead of nemo-analytics__archive-file-transfer.json. I have processed BrainSpanBulkDevo and HuttCtxDevoLMDhs (without tab analysis) and pushed them to the bucket. In order for @carlocolantuoni to see them @jorvis will need to start a cron on the server.

If Carlo wants to check out the uploaded files (which were processed ignoring the analysis) @jorvis can set it up quickly. If we want to wait until Joshua incorporates analysis, it might take some more time.

We also need to decide if the cron should start now, once the analysis is incorporated or some other date. I will set up a chat with @victor73 to discuss the details accordingly.

carlocolantuoni commented 4 years ago

I think we should go ahead and look at things now ignoring the extra analyses, which will take a while.

Ill take a peek tonight unless joshua still needs to "start a cron on the server" as apaala mentioned.

Let me kno if i should proceed or wait.

Thnx, Carlo

On Mon, Mar 2, 2020, 14:58 apaala notifications@github.com wrote:

@carlocolantuoni https://github.com/carlocolantuoni @jorvis https://github.com/jorvis

I was able to successfully upload the files to google (I was trying to use the key I made instead of nemo-analytics__archive-file-transfer.json. I have processed BrainSpanBulkDevo and HuttCtxDevoLMDhs (without tab analysis) and pushed them to the bucket. In order for @carlocolantuoni https://github.com/carlocolantuoni to see them @jorvis https://github.com/jorvis will need to start a cron on the server.

If Carlo wants to check out the uploaded files (which were processed ignoring the analysis) @jorvis https://github.com/jorvis can set it up quickly. If we want to wait until Joshua incorporates analysis, it might take some more time.

We also need to decide if the cron should start now, once the analysis is incorporated or some other date. I will set up a chat with @victor73 https://github.com/victor73 to discuss the details accordingly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/91?email_source=notifications&email_token=AH7KC7SCHPGZINVLED3FMDTRFQFXXA5CNFSM4KQRBPH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENQXHDA#issuecomment-593589132, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7QQBJBEGYNNLDJBMO3RFQFXXANCNFSM4KQRBPHQ .

carlocolantuoni commented 4 years ago

i cant see any of the new data, so guessing theres another step to be done - let me know when they are up in NeMO. thnx

apaala commented 4 years ago

@carlocolantuoni @jorvis will probably be able to let you know how much time it will take but based on my conversation with him last he mentioned that it should be quick. From what I understand he needs to do something to move the files from the bucket so it's accessible to gear.

carlocolantuoni commented 4 years ago

hey apaala and joshua - i thought the new datsets were pulled in past this step and reasy in NeMO Analytics, but i cant see them - did they not make it all the way thru after correction of the meta data etc?

apaala commented 4 years ago

@carlocolantuoni last I communicated with @jorvis I think there was some confusion around which files we wanted to use... We had edited the metadata and re-uploaded it. In order to re-upload it, I had to delete the lines from the log file that had the previous UIDs. The two new UIDS of fixed uploaded dataset are: db69f24f-ca14-4a38-9423-57af5c5b0f14 339aeb36-d3ad-48bf-9ce9-688189331689 and the one that was uploaded earlier is c96c6990-55b9-48a0-b7f1-08157487a7af

carlocolantuoni commented 4 years ago

hey joshua, know where this is currently stuck? carlo

On Mon, Mar 16, 2020 at 6:46 PM apaala notifications@github.com wrote:

@carlocolantuoni https://github.com/carlocolantuoni last I communicated with @jorvis https://github.com/jorvis I think there was some confusion around which files we wanted to use... We had edited the metadata and re-uploaded it. In order to re-upload it, I had to delete the lines from the log file that had the previous UIDs. The two new UIDS of fixed uploaded dataset are: db69f24f-ca14-4a38-9423-57af5c5b0f14 339aeb36-d3ad-48bf-9ce9-688189331689 and the one that was uploaded earlier is c96c6990-55b9-48a0-b7f1-08157487a7af

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nemoarchive/analytics/issues/91#issuecomment-599791729, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH7KC7V2FEUYSCDVPW5UBW3RH2T4LANCNFSM4KQRBPHQ .

-- Carlo

apaala commented 4 years ago

@carlocolantuoni I think Joshua is out until Friday.

jorvis commented 4 years ago

You should be able to see these now. Please re-open if there are any issues.

carlocolantuoni commented 4 years ago

fyi - it looks like data set "HuttCtxDevoLMDhs" was pulled in twice

jorvis commented 4 years ago

Apaala told me she regenerated them but I failed to remove the older one. I saw both in the database now and just removed the one you hadn't created any curations with.