norment / tsd_issues

Repo to track issues with TSD as tickets
2 stars 0 forks source link

S3-api permission issue #61

Closed denvdm closed 3 years ago

denvdm commented 3 years ago

Getting permission errors when attempting to use s3-API on NIRD to import, export, or peform any other action. This worked fine last month. Exact error pasted below

ERROR: Error parsing xml: mismatched tag: line 6, column 2
ERROR: b'<html>\r\n<head><title>401 Authorization Required</title></head>\r\n<body>\r\n<center><h1>401 Authorization Required</h1></center>\r\n<hr><center>nginx/1.16.1</center>\r\n</body>\r\n</html>\r\n'
ERROR: S3 error: 401 (Unauthorized)
ofrei commented 3 years ago

@denvdm have you tried tacl api? https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/import-export/import-data-using-the-tsd-api.html My feeling is that TSD team have a low-priority for maintaining tsd-s3cmd. I confirmed last week tsd-s3cmd it is officially supported, but they advice users to go for tacl whenever possible, and in the long term TSD may consider deprecating tsd-s3cmd. If there are things that tacl can't do then we can push for maintaining tsd-s3cmd, but I'd like to check if there is a real need for this.

Could you please try tacl next time you import/export, and tell us here if it doesn't cover your needs or is a step back compared to tsd-s3cmd?

ofrei commented 3 years ago

For me tacl looks promising, but currently it don't work. I've submitted an RT ticket #4242337.

>tacl p33 --upload CorticalArea.csv
CorticalArea.csv |################################| 100%
401 Client Error: Unauthorized for url: https://api.tsd.usit.no/v1/p33/files/stream/CorticalArea.csv?group=p33-member-group
The request was unsuccesful. Exiting.
ofrei commented 3 years ago

I've solved my tacl --upload problem by running tacl p33 --session-delete. Now it works, also for uploads of folders.

I've upgraded to tacl v3.3.1, but still got the same error. Then I've tried uploading to p697 - this worked, but for p33 it still failed. I've tried to register tacl with p33 project again, using "tacl --register" -- this worked, did not resolve my problem with uploads. Finally, I ran "tacl p33 --session-delete" , and this helped. It's a bit strange - the issue persisted for a few days and I had to re-type my password & OTP many times, so there must be some state that is only cleaned by --session-delete.

denvdm commented 3 years ago

Hey Alex, Looks indeed promising and no, I hadn’t tried it yet. I did one successful test run just now and seems all good. However… I now need to import ~5TB of UKB diffusion imaging data. Would tacl be up to this? i.e. can it handle such a load (7k zips, each about 700MB), is the default import dir an acceptable location for this, and how about speed (this is coming from NIRD)? Thanks. Best, Dennis

On 18 Jan 2021, at 16:40, Oleksandr Frei @.***> wrote:

@denvdm https://github.com/denvdm have you tried tacl api? https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/import-export/import-data-using-the-tsd-api.html https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/import-export/import-data-using-the-tsd-api.html My feeling is that TSD team have a low-priority for maintaining tsd-s3cmd. I confirmed last week that it is officially supported, but they advice users to go for tacl whenever possible, and in the long term TSD may consider deprecating tsd-s3cmd. If there are things that tacl can't do then we can push for maintaining tsd-s3cmd, but I'd like to check if there is a real need for this.

Could you please try tacl next time you import/export, and tell us here if it doesn't cover your needs or is a step back compared to tsd-s3cmd?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/norment/tsd_issues/issues/61#issuecomment-762326539, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE3BEA2LGTRLFKCL3PWWOLS2RJAXANCNFSM4WEG57MQ.

denvdm commented 3 years ago

tacl worked very well, very pleasant and it was able to copy almost half (3k, >2TB) of the zips from NIRD to the import dir. However, then I got kicked out of NIRD; after reconnecting and attempting to restart the upload I received the below error. Of course I checked and the file does exist (with every attempt, it complains about a different file). Any idea what is going on here? Out of frustration I also tried the —session-delete mentioned in the previous mails, but I get the same error after re-authenticating. Thanks for any insights. Best, Dennis

On 20 Mar 2021, at 18:59, Dennis van der Meer @.***> wrote:

Hey Alex, Looks indeed promising and no, I hadn’t tried it yet. I did one successful test run just now and seems all good. However… I now need to import ~5TB of UKB diffusion imaging data. Would tacl be up to this? i.e. can it handle such a load (7k zips, each about 700MB), is the default import dir an acceptable location for this, and how about speed (this is coming from NIRD)? Thanks. Best, Dennis

On 18 Jan 2021, at 16:40, Oleksandr Frei @. @.>> wrote:

@denvdm https://github.com/denvdm have you tried tacl api? https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/import-export/import-data-using-the-tsd-api.html https://www.uio.no/english/services/it/research/sensitive-data/use-tsd/import-export/import-data-using-the-tsd-api.html My feeling is that TSD team have a low-priority for maintaining tsd-s3cmd. I confirmed last week that it is officially supported, but they advice users to go for tacl whenever possible, and in the long term TSD may consider deprecating tsd-s3cmd. If there are things that tacl can't do then we can push for maintaining tsd-s3cmd, but I'd like to check if there is a real need for this.

Could you please try tacl next time you import/export, and tell us here if it doesn't cover your needs or is a step back compared to tsd-s3cmd?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/norment/tsd_issues/issues/61#issuecomment-762326539, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE3BEA2LGTRLFKCL3PWWOLS2RJAXANCNFSM4WEG57MQ.

ofrei commented 3 years ago

@denvdm eh, too bad... what was the error? I don't see it attached. Also, do you use screen session? It's best to run sync within screen to make sure it survives a disconnect . But it's not an excuse for tacl not been able to resume the session - tacl should resume just fine, let's investigate why it doesn't

denvdm commented 3 years ago

Ha! I figured it out: as per usual it was stupidity on my side, and your ’screen’ remark was the key... I did use ‘screen’, which you taught me a few years back (and it has been a lifesaver for these long copying jobs), so when I got kicked out of NIRD, the job of course continued. However, I didn’t think of this and tried to restart the still running process, which must be what is causing the error. In the meantime I checked the import dir and indeed the number of files is still growing. Just to satisfy any curiosity, I have attached the error screenshot as a file rather than the image I pasted earlier. By the way, I think I just assumed the job crashed because it had done that already a few (3-4) times earlier; not really a big deal and those times it did simply continue when resubmitting the upload command.

PastedGraphic-5

On 24 Mar 2021, at 19:51, Oleksandr Frei @.***> wrote:

@denvdm https://github.com/denvdm eh, too bad... what was the error? I don't see it attached. Also, do you use screen session? It's best to run sync within screen to make sure it survives a disconnect . But it's not an excuse for tacl not been able to resume the session - tacl should resume just fine, let's investigate why it doesn't

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/norment/tsd_issues/issues/61#issuecomment-806072747, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE3BEBDUYBYHIEI3WWDJJDTFIYCPANCNFSM4WEG57MQ.

denvdm commented 3 years ago

Re. the above, unfortunately, turns out I did not figure it out. The error in the end did not seem to be caused by trying to restart a running process. The transfer this time is definitely dead (number of files in import dir hasnt increased in hours) and when I try to start up tacl upload again (tacl p33 --upload 20250) I am still getting the same error, see pasted below (btw, that file is definitely present). Any thoughts?

File "/nird/home/dennisva/.local/bin/tacl", line 11, in sys.exit(cli()) File "/nird/home/dennisva/.local/lib/python3.6/site-packages/click/core.py", line 829, in call return self.main(args, kwargs) File "/nird/home/dennisva/.local/lib/python3.6/site-packages/click/core.py", line 782, in main rv = self.invoke(ctx) File "/nird/home/dennisva/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke return ctx.invoke(self.callback, ctx.params) File "/nird/home/dennisva/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke return callback(args, **kwargs) File "/nird/home/dennisva/.local/lib/python3.6/site-packages/tsdapiclient/tacl.py", line 518, in cli uploader.sync() File "/nird/home/dennisva/.local/lib/python3.6/site-packages/tsdapiclient/sync.py", line 286, in sync self._transfer(resource, integrity_reference=integrity_reference) File "/nird/home/dennisva/.local/lib/python3.6/site-packages/tsdapiclient/sync.py", line 585, in _transfer resource, integrity_reference=integrity_reference File "/nird/home/dennisva/.local/lib/python3.6/site-packages/tsdapiclient/sync.py", line 425, in _transfer_local_to_remote if os.stat(resource).st_size > CHUNK_THRESHOLD: FileNotFoundError: [Errno 2] No such file or directory: '20250//3138077_20250_2_0.zip'

ofrei commented 3 years ago

@denvdm Yeah, this is weird. The file definitely exists (I know where to look - checked just now), and "chmod" permissions are fine.

I've noticed that 20250 folder within TSD had exactly 5000 files - sounds like some sort of limit. Could you please submit a ticket to TSD-drift? Add please add a link to this github ticket - here it's a good discussion....

I've tried removing one file (the text file with field lists), and then re-running the sync with tacl p33 --upload-sync 20250. First problem - this took ~15 minutes doing fetching information about directory. So slow perf is too bad, --upload-sync can't spend ~15 minutes to go just over ~5000 files, `rsync`` can do it in less than a second.

$ tacl p33 --upload-sync 20250
uploading directory 20250
fetching information about directory: 20250
fetching information about directory: 20250
fetching information about directory: 20250
fetching information about directory: 20250

Finally, after ~15 minutes, tacl started copying the same files as you've already had in the target folder on TSD - so it still has 4999 files and I couldn't validate my theory about 5000 files. That's a third problem - why the same files are synced again even though I'm running --update-sync, not --update?

ofrei commented 3 years ago

@denvdm btw, for me tsd-s3cmd works - I guess it could be the quickest way to resolve your data transfer.

Long term, let's push for a better tacl - it seem quite handy. If there is a limit of 5000 it's likely a quick fix, but I'm more concerned about slow perf in --upload-sync.

leondutoit commented 3 years ago

What behaviour do you want from the sync here? Do you want files that are removed locally (from NIRD) to be removed remotely (from TSD)?

leondutoit commented 3 years ago

The 5000 limit sounds weird, and I cannot imagine where that would come from. I will try to reproduce it.

If this is just a directory upload, and not a sync of a routinely changing directory, then tacl p33 --upload {directory} is more appropriate since there will be no waiting and automatic resume.

If it is a sync (and you want local changes to propagate to the remote) then you need to explicitly enable caching so you get resume:

tacl --guide sync

...

By default, there is no caching for sync, because the normal
use case would be to copy a directory which has many files
in total, but only a few changing ones. If you are in control
of the changes, and you know there will not be any changes while
your transfer is running, then you can enable caching like this:

    tacl p11 --download-sync mydir --cache-sync

This will allow resuming the sync without having to query the API
and the local filesystem for its current state.

You'll be using --upload-sync though.

denvdm commented 3 years ago

@leondutoit, I indeed used tacl p33 --upload dir. I dont know what may be causing the error but as @ofrei indicated it does seem awfully coincidental it gets stuck at such a round number. Then again I already got an error a day earlier when it hadnt reached this number yet (see earlier messages). Re. the specific error message, as Alex also checked, the file definitely did exist. By the way (perhaps unfortunately wrt figuring this out), Ivan has moved on with these files and they have now been removed from the import dir. I've started a new run of tacl for the remaining files and that is running smoothly.

leondutoit commented 3 years ago

Ah ok, --upload {dir} should do the right thing. I tested the 5000 limit like this: mkdir -p d1 && for i in `seq 1 5001`; do mkfile 1k d1/$i.txt; done; time tacl p11 --basic --upload d1 and didn't see any issues. Conceptually each file is an independent transfer, so there should be no issue.

leondutoit commented 3 years ago

I can reproduce the FileNotFoundError like this:

ldt:~ leondutoit$ mkdir -p d3 && for i in `seq 1 10`; do mkfile 1k d3/$i.txt; done; tacl p11 --basic --upload d3
uploading directory d3
d3/10.txt |################################| 100%
d3/9.txt |################################| 100%
d3/8.txt |################################| 100%
d3/5.txt |################################| 100%
d3/4.txt |################################| 100%
^C
Aborted!
ldt:~ leondutoit$ rm d3/3.txt
ldt:~ leondutoit$ tacl p11 --basic --upload d3
uploading directory d3
resuming directory transfer from cache
d3/4.txt |################################| 100%
d3/6.txt |################################| 100%
d3/7.txt |################################| 100%
Traceback (most recent call last):
  File "/usr/local/bin/tacl", line 33, in <module>
    sys.exit(load_entry_point('tsd-api-client==3.3.1', 'console_scripts', 'tacl')())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/tsd_api_client-3.3.1-py3.9.egg/tsdapiclient/tacl.py", line 518, in cli
  File "/usr/local/lib/python3.9/site-packages/tsd_api_client-3.3.1-py3.9.egg/tsdapiclient/sync.py", line 286, in sync
  File "/usr/local/lib/python3.9/site-packages/tsd_api_client-3.3.1-py3.9.egg/tsdapiclient/sync.py", line 586, in _transfer
  File "/usr/local/lib/python3.9/site-packages/tsd_api_client-3.3.1-py3.9.egg/tsdapiclient/sync.py", line 427, in _transfer_local_to_remote
FileNotFoundError: [Errno 2] No such file or directory: 'd3/3.txt'

What's happing here is that the local cache contains all files in the directory when the upload starts, and removes them as they succeed. Then I cancel the upload, delete a file that has not been uploaded, and restart the upload. Now the missing local file is still listed in the cache, and when trying to upload it, it fails.

denvdm commented 3 years ago

Great, that makes a lot of sense, nicely solved! However, I did not delete any files in between the failed upload and the second attempt. And it seemed to be complaining about a different file missing at every attempt. Anyway, thanks for looking into this in detail. Tacl in general does seem like a very good solution, and easier to get working than tsd-s3cmd.

On 27 Mar 2021, at 18:32, Leon du Toit @.***> wrote:

I can reproduce the FileNotFoundError like this:

ldt:~ leondutoit$ mkdir -p d3 && for i in seq 1 10; do mkfile 1k d3/$i.txt; done; tacl p11 --basic --upload d3 uploading directory d3 d3/10.txt |################################| 100% d3/9.txt |################################| 100% d3/8.txt |################################| 100% d3/5.txt |################################| 100% d3/4.txt |################################| 100% ^C Aborted! ldt:~ leondutoit$ rm d3/3.txt ldt:~ leondutoit$ tacl p11 --basic --upload d3 uploading directory d3 resuming directory transfer from cache d3/4.txt |################################| 100% d3/6.txt |################################| 100% d3/7.txt |################################| 100% Traceback (most recent call last): File "/usr/local/bin/tacl", line 33, in sys.exit(load_entry_point('tsd-api-client==3.3.1', 'console_scripts', 'tacl')()) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.9/site-packages/click/core.py", line 555, in invoke return callback(args, **kwargs) File "/usr/local/lib/python3.9/site-packages/tsd_api_client-3.3.1-py3.9.egg/tsdapiclient/tacl.py", line 518, in cli File "/usr/local/lib/python3.9/site-packages/tsd_api_client-3.3.1-py3.9.egg/tsdapiclient/sync.py", line 286, in sync File "/usr/local/lib/python3.9/site-packages/tsd_api_client-3.3.1-py3.9.egg/tsdapiclient/sync.py", line 586, in _transfer File "/usr/local/lib/python3.9/site-packages/tsd_api_client-3.3.1-py3.9.egg/tsdapiclient/sync.py", line 427, in _transfer_local_to_remote FileNotFoundError: [Errno 2] No such file or directory: 'd3/3.txt' What's happing here is that the local cache contains all files in the directory when the upload starts, and removes them as they succeed. Then I cancel the upload, delete a file that has not been uploaded, and restart the upload. Now the missing local file is still listed in the cache, and when trying to upload it, it fails.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/norment/tsd_issues/issues/61#issuecomment-808766619, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE3BEFMBFMCRZJOQPRES3DTFYJCBANCNFSM4WEG57MQ.

leondutoit commented 3 years ago

@ofrei I made a configuration change on the server, which sped up the scanning part of my sync of 5001 files from 2min to 9sec.

leondutoit commented 3 years ago

I'll assume this is not an issue anymore, but if so just ping me here.

ofrei commented 3 years ago

@leondutoit Thank you for fixing this! I'm busy (major grant deadline this Thursday), will re-test tacl sync performance on Friday. @denvdm are you continue some of large-scale transfers, or already done by now?

denvdm commented 3 years ago

Done for now, and I don’t see anything major coming up this month at least. TACL worked well (apart from that little hiccup)!

On 6 Apr 2021, at 20:41, Oleksandr Frei @.***> wrote:

@leondutoit https://github.com/leondutoit Thank you for fixing this! Major grant deadline this Thursday, I'll re-test on Friday. @denvdm https://github.com/denvdm are you continue some of large-scale transfers, or already done by now?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/norment/tsd_issues/issues/61#issuecomment-814354619, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACE3BEDKCB27HQGLXQUTTLLTHNIV7ANCNFSM4WEG57MQ.

leondutoit commented 3 years ago

@ofrei the latest release of tacl: https://pypi.org/project/tsd-api-client/3.4.0/ includes better sync performance, better Windows support, and some other small improvements, feel free to give it a go

ofrei commented 3 years ago

@leondutoit Upgraded to tacl 3.4.0 - now testing sync of github repo. From time to time it gives the following error:


DEBUG streaming data to https://api.tsd.usit.no/v1/p697/files/stream/p697-member-group/github/norment/moba_qc_imputation/.git/logs/HEAD?group=p697-member-group

DEBUG reading file: github/norment/moba_qc_imputation/.git/logs/HEAD
github/norment/moba_qc_imputation/.git/logs/HEAD
DEBUG reading chunk
github/norment/moba_qc_imputation/.git/logs/HEAD |################################| 100%
DEBUG chunk read complete

DEBUG reading chunk
github/norment/moba_qc_imputation/.git/logs/HEAD |################################| 100%
DEBUG no more data to read

405 Client Error: Method Not Allowed for url: https://api.tsd.usit.no/v1/p697/files/stream/p697-member-group/github/norment/moba_qc_imputation/.git/logs/HEAD?group=p697-member-group
The request was unsuccesful. Exiting.
leondutoit commented 3 years ago

@ofrei Ah yes, I saw that too. It is a config option on the server side which has to allow deletion of files, which I forgot to set. Will do later today.

ofrei commented 3 years ago

@leondutoit Thank you! Happy to re-test when this is ready. I've also noticed that the files are synced in unpredictable order. I.e. if I sync 3 folders 10 files each, the order of the files will be completely random, interleaving folders at random. Is there a way to fix this? Perhaps it's a simple as adding "sorted", or the changing some data structure to something like OrderedDict ?

leondutoit commented 3 years ago

The order of the upload? Why is this a problem?

ofrei commented 3 years ago

It's not a big problem, but any non-deterministic behaviour is less user-friendly than when things happen in well determined order. Take the error that I reported as an example. TACL worked fine to sync a few files, than it encounter a delete operation, and failed. I re-start - and it start syncing some other files until the next delete operation. So the failure look quite unpredictable to me - I couldn't even see if it's related to a specific file.

leondutoit commented 3 years ago

I already explained the delete issue. As for "non-deterministic order" this is how python returns directory entries:

In [81]: os.listdir?
Signature: os.listdir(path=None)
Docstring:
Return a list containing the names of the files in the directory.

path can be specified as either str, bytes, or a path-like object.  If path is bytes,
  the filenames returned will also be bytes; in all other circumstances
  the filenames returned will be str.
If path is None, uses the path='.'.
On some platforms, path may also be specified as an open file descriptor;\
  the file descriptor must refer to a directory.
  If this functionality is unavailable, using it raises NotImplementedError.

The list is in arbitrary order.  It does not include the special
entries '.' and '..' even if they are present in the directory.
Type:      builtin_function_or_method

Note The list is in arbitrary order. I'm not going to sacrifice performance and memory usage to force a certain order.

ofrei commented 3 years ago

ok, I see! Thanks.

On Wed, Apr 21, 2021 at 12:11 PM Leon du Toit @.***> wrote:

I already explained the delete issue. As for "non-deterministic order" this is how python returns directory entries:

In [81]: os.listdir? Signature: os.listdir(path=None) Docstring: Return a list containing the names of the files in the directory.

path can be specified as either str, bytes, or a path-like object. If path is bytes, the filenames returned will also be bytes; in all other circumstances the filenames returned will be str. If path is None, uses the path='.'. On some platforms, path may also be specified as an open file descriptor;\ the file descriptor must refer to a directory. If this functionality is unavailable, using it raises NotImplementedError.

The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory. Type: builtin_function_or_method

Note The list is in arbitrary order. I'm not going to sacrifice performance and memory usage to force a certain order.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/norment/tsd_issues/issues/61#issuecomment-823946226, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEPV6IXRPKQVGF3OXQ67ZTTJ2QDNANCNFSM4WEG57MQ .

leondutoit commented 3 years ago

The delete issue should be fixed now.

ofrei commented 3 years ago

@leondutoit I upgraded tacl, re-run --upload-sync, and still have this issue

405 Client Error: Method Not Allowed for url: https://api.tsd.usit.no/v1/p697/files/stream/p697-member-group/tsd_monitoring/.git/logs/refs/heads/master?group=p697-member-group

However I'm closing this ticket - the original question from @denvdm is solved and we're back to using tsd-s3cmd.

leondutoit commented 3 years ago

Can't reproduce btw:

ldt:~ leondutoit$ rm d3/10.txt
ldt:~ leondutoit$ tacl  p11 --upload-sync d3
uploading directory d3
fetching information about directory: d3
deleting: d3/10.txt