uhd-urz / elAPI

An extensible API client for eLabFTW
GNU Affero General Public License v3.0
5 stars 0 forks source link

Add support for --overwrite - [merged] #92

Closed alexander-haller closed 4 months ago

alexander-haller commented 7 months ago

In GitLab by @mhxion on Apr 28, 2024, 24:40

Merges overwrite-for-export -> dev

This PR mainly introduces the overdue --overwrite support for --export. We also fix some related bugs.

No warning before creating empty files

Fixes #17. elapi will no longer show WARNING PATH=<path> could not be found. An attempt to create PATH=<path> will be made. before creating a new file through --export.

$ elapi get info --export ~/info.yml
INFO     info data successfully exported to /Users/<username>/info.yml in YAML format.                                               export.py:71
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
No warning before creating info.yml

If --export is given a path that includes a directory that doesn't exist, then elAPI will create that directory, and show an INFO message before exporting.

$ elapi get info --export ~/foo/bar/info.yml
INFO     Directory PATH=/Users/<username>/foo/bar from SOURCE=/Users/<username>/foo/bar/info.yml could not be found. An attempt to      path.py:148
         create directory /Users/<username>/foo/bar will be made.
INFO     info data successfully exported to /Users/<username>/foo/bar/info.yml in YAML format.                                       export.py:71

This INFO message can still get annoying (thought it makes debugging easier). I will be still be open to disabling this INFO message altogether (and maybe only store the INFO message in elAPI log file for debugging).

--overwrite

From elapi get --help:

--overwrite                If given --export/-e path is a file, and it already exists but it isn't empty, elapi will not overwrite to the  
                           file by default, and instead will use fallback location. --overwrite can be passed if elapi should overwrite a   
                           non-empty file when exporting.

Simple use-case

The following can be the simplest use-case of --overwrite.

$ elapi get info --export ~/info.yml
INFO     info data successfully exported to /Users/<username>/info.yml in YAML format.

Now ~/info.yml already exists. If we try to export to ~/info.yml again:

$ elapi get info --export ~/info.yml
WARNING  --export path '/Users/<username>/info.yml' already exists and is not empty! elapi will use fallback export location.       export.py:118

   Note: Use '--overwrite' to force '--export' to write to a non-empty file.

INFO     info data successfully exported to /Users/<username>/elapi/2024-04-28_002535_info.json in JSON format.                        export.py:71

I.e., if a file already exists and is not empty, elapi will use fallback location for exporting instead of failing/aborting. Fallback location can be found from elapi show-config. If we try to export to ~/info.yml once more with --overwrite:

$ elapi get info --export ~/info.yml --overwrite
INFO     info data successfully exported to /Users/culdesac/info.yml in YAML format.                                               export.py:71

The export succeeds.

--export ignores empty file

--export will not fallback or will not need --overwrite if a given file already exists but it is empty. In other words, empty files are overwritten anyway.

$ touch ~/info_2.yml
# ~/info_2.yml now exists but empty.
$ elapi get info --export ~/info_2.yml
INFO     info data successfully exported to /Users/culdesac/info_2.yml in YAML format.                                               export.py:71

This way --export does not overwrite to a file if and only if the file is non-empty.

Note: The way elAPI decides if a file is empty or not is by reading the file in byte mode. If a file is truly empty, no byte will be read.

alexander-haller commented 7 months ago

In GitLab by @mhxion on Apr 28, 2024, 24:40

requested review from @alexander-haller

alexander-haller commented 6 months ago

This INFO message can still get annoying (thought it makes debugging easier). I will be still be open to disabling this INFO message altogether (and maybe only store the INFO message in elAPI log file for debugging).

I suggest a wget-like behavior:

alex@ahqurz1:/tmp$ wget exmple.com\ --2024-04-29 18:44:05--  http://exmple.com/\ Resolving exmple.com (exmple.com)... 67.210.233.131\ Connecting to exmple.com (exmple.com)|67.210.233.131|:80... connected.\ HTTP request sent, awaiting response... 200 OK\ Length: 2301 (2,2K) [text/html]\ Saving to: ‘index.html.1’\ \ index.html.1                                               100%[========================================================================================================================================>]   2,25K  --.-KB/s    in 0s       \ \ 2024-04-29 18:44:05 (130 MB/s) - ‘index.html.1’ saved [2301/2301]\ \ alex@ahqurz1:/tmp$ wget --quiet example.com\ alex@ahqurz1:/tmp$

alexander-haller commented 6 months ago

But only nice-to-have for future thought.

alexander-haller commented 6 months ago

--export will not fallback or will not need --overwrite if a given file already exists but it is empty. In other words, empty files are overwritten anyway.

this is thoughtful but needs to be well-documented since it is not obvious (but probably also not many people will have this case)

alexander-haller commented 6 months ago

In GitLab by @mhxion on Apr 29, 2024, 23:02

--quiet sounds like a perfect remedy for this. I will open an feature-request issue, and work on after generate-table.

alexander-haller commented 6 months ago

In GitLab by @project_994_bot_1c5bd6ac1fdbd740fbc4ed080ecff58d on Apr 29, 2024, 23:32

added 1 commit

Compare with previous version

alexander-haller commented 6 months ago

In GitLab by @mhxion on Apr 29, 2024, 23:32

That's a fair concern. I have added the line Note that even without --overwrite, elapi will overwrite an existing file if the file is empty., and updated the entire --help for --overwrite.

image

alexander-haller commented 6 months ago

In GitLab by @project_994_bot_1c5bd6ac1fdbd740fbc4ed080ecff58d on May 4, 2024, 02:27

added 2 commits

Compare with previous version

alexander-haller commented 6 months ago

In GitLab by @project_994_bot_1c5bd6ac1fdbd740fbc4ed080ecff58d on May 4, 2024, 18:01

added 3 commits

Compare with previous version

alexander-haller commented 6 months ago

In GitLab by @mhxion on May 4, 2024, 18:06

I have updated the behavior. Without --overwrite, elapi will not overwrite if the file already exists disregarding whether or not it's empty. The updated documentation for --overwrite:

image

An example:

$ touch ~/info.yml
$ elapi get info --export ~/info.yml

WARNING  --export path '/Users/<username>/info.yml' already exists! elapi will use fallback export location.                        export.py:117

   Note: Use '--overwrite' to force '--export' to write to an existing file.

INFO     info data successfully exported to /Users/<username>/Downloads/elapi/2024-05-04_180531_info.json in JSON format.            export.py:71
$ elapi get info --export ~/info.yml --overwrite
INFO     info data successfully exported to /Users/<username>/info.yml in YAML format.                                               export.py:71
alexander-haller commented 6 months ago

In GitLab by @mhxion on May 4, 2024, 18:19

Insight: This change wasn't as straightforward as I thought. In fact, I had forgotten why I initially decided to only overwrite a file if it was non-empty. The main reason was: Any file path given to --export would be validated first. And PathValidator would always create the file as part of its validation logic. If the file didn't exist before validation, it would exist after validation. So the file would always exist for --export! So it wasn't possible for export to decide if a file exists because it was created with touch or because it was created as part of the validation. So checking whether or not the file was empty was the best approach. I now added a new attribute to PathValidator which makes it possible for --export to know if an empty file merely exists because of a result of validation or not.

alexander-haller commented 6 months ago

If the file didn't exist before validation, it would exist after validation.

does PathValidator not clean up the file? Note to me: I need to look at the code ;-)

alexander-haller commented 6 months ago

In GitLab by @mhxion on May 6, 2024, 15:06

does PathValidator not clean up the file?

It can but it doesn't do it by default. I recently added this retain_created_file attribute to PathValidator to address this concern. The reason the default value for retain_created_file is True so PathValidator doesn't clean up a created file by default (except the .tmp_*** for directories only) is because the only reason we use PathValidator is to make sure if a file is actually writable (this includes all sorts of permission issues, storage capacity issues, etc.). I.e., we want to write to the file right after validating. This is why PathValidator by default doesn't clean up or delete the file it creates because it assumes we will be writing to the file soon (with --export). Deleting it will mean, --export will have to create the file again. This I/O operation for re-creating the file is unnecessary in most cases as we know we're definitely going to write to the file after validating.

There can be some very special cases where 1. PathValidator creates the file successfully (with touch), 2. attempts to write (a single character) into the file to check if the file is writable, but it fails, so PathValidator throws a ValidationError. This file was invalidated, so we will not be writing to this file. But because of (1) this file (empty) exists, and now it is just hanging around without being cleaned up. But I have yet to come across this case. Usually, when a file has permission and similar issues, (1) always fails. I.e., PathValidator is unable to create the file, so an empty file never hangs around, and it never has to clean up a file by default. In any case, we have the option to clean up by passing retain_created_file=False if we encounter such a case.

alexander-haller commented 6 months ago

In GitLab by @mhxion on May 6, 2024, 15:12

Note: I could be wrong and deleting/cleaning-up the file is a very cheap I/O operation. In that case, cleaning up would be a clean solution too. In fact, cleaning up was my first solution (commit 6b2b906, line 86 ;)). But I then removed it next day, in favor of checking where the file was created from, so we'd not have to re-create the file.

alexander-haller commented 6 months ago

Main concern would be dummy files that exist when the later operation fails. But honestly this is already very advanced "what could go wrong" ;-)

I'm fine with both approaches.

alexander-haller commented 6 months ago

approved this merge request

alexander-haller commented 6 months ago

In GitLab by @mhxion on May 9, 2024, 02:14

Main concern would be dummy files that exist when the later operation fails.

Totally understandable. I will add an internal note on this. Thank you for approving!

alexander-haller commented 6 months ago

In GitLab by @mhxion on May 9, 2024, 02:14

resolved all threads