jupyterlab / jupyterlab

JupyterLab computational environment.
https://jupyterlab.readthedocs.io/
Other
14.16k stars 3.38k forks source link

Lab 4 editor corrupting files with CRLF (Windows) line endings with RTC enabled/present #14715

Open fperez opened 1 year ago

fperez commented 1 year ago

Update

Leaving the original description below, but it turns out the problem is with files having Windows line endings (\r\n). Therefore the recipe to replicate it that I suggested doesn't actually work, since I copy-pasted data in the markdown description. See below for a simple way to replicate the problem instead.

Description

The screenshot below illustrates the problem, but it should be easy to reproduce (steps below). The file payments-original.csv is shown for reference, I opened it and left it untouched. I edited a copy, called payments.csv, aiming to remove the $signs in the Amount column. Though I carefully edited only the $signs, the file got corrupted, as shown in payments-2.csv: the edits modified different parts in each line.

image

Reproduce

  1. Save the following content to a CSV file:
Sent,Payee,Amount,Deliver By,Payment Account,Status,Payment Category,Cleared On,Confirmation Number,Memo
5/4/22,John Doe,"$2,700.00 ",5/11/22,XXXXXX7747,PAID, None,5/11/22,J9YKS8BK,
4/8/22,John Doe,"$1,500.00 ",4/15/22,XXXXXX7747,PAID, None,4/22/22,P9YKJ73P,
3/9/22,John Doe,"$1,500.00 ",3/16/22,XXXXXX7747,PAID, None,3/24/22,39BK2HFS,
  1. Open it with the CSV viewer and then with the editor, with JuptyerLab 4.0.2 in RTC mode.
  2. Edit each line to remove the $ signs.
  3. Watch once the CSV table view refreshes, the corrupted results.

Expected behavior

Editor doesn't corrupt files being edited.

Context

JasonWeill commented 1 year ago

Possibly related to #14752, although 14752 does not necessarily concern RTC.

fperez commented 1 year ago

For record keeping - during yesterday's JLab team call, I tried to reproduce the problem on the exact same system where I originally encountered, and couldn't. If it happened to me once, it will happen again to someone, so I hope we can figure out what the problem is, but obviously it's going to be harder if replication isn't straightforward. I'll keep reporting if I make any progress.

minrk commented 1 year ago

Since the offset is off by one more each line, I'm almost certain this is something counting the CRLF line ending as one character and something else counting it as two.

Fully reproducible screencast:

https://github.com/jupyterlab/jupyterlab/assets/151929/f35b84bf-7c20-452f-81c7-49a3847e8338

code to create the file:

with open("file.txt", "w") as f:
    for i in range(5):
        f.write("12345x6789\r\n")

Observe that the actual character deleted shifts by one for each line:

123456789
1234x6789
1235x6789
1245x6789
1345x6789

which I suspect is explained by something or other not accounting the \r character consistently, e.g. Python opening a file with universal_newlines=True.

pip freeze ``` # fresh Python 3.11.3 venv with `pip install jupyterlab jupyter-collaboration` aiosqlite==0.19.0 anyio==3.7.0 appnope==0.1.3 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 arrow==1.2.3 asttokens==2.2.1 async-lru==2.0.2 attrs==23.1.0 Babel==2.12.1 backcall==0.2.0 beautifulsoup4==4.12.2 bleach==6.0.0 build==0.10.0 certifi==2023.5.7 cffi==1.15.1 charset-normalizer==3.1.0 click==8.1.3 comm==0.1.3 debugpy==1.6.7 decorator==5.1.1 defusedxml==0.7.1 executing==1.2.0 fastjsonschema==2.17.1 fqdn==1.5.1 idna==3.4 ipykernel==6.23.3 ipython==8.14.0 isoduration==20.11.0 jedi==0.18.2 Jinja2==3.1.2 json5==0.9.14 jsonpointer==2.4 jsonschema==4.17.3 jupyter-events==0.6.3 jupyter-lsp==2.2.0 jupyter-ydoc==1.0.2 jupyter_client==8.3.0 jupyter_collaboration==1.0.1 jupyter_core==5.3.1 jupyter_server==2.7.0 jupyter_server_fileid==0.9.0 jupyter_server_terminals==0.4.4 jupyterlab==4.0.2 jupyterlab-pygments==0.2.2 jupyterlab_server==2.23.0 MarkupSafe==2.1.3 matplotlib-inline==0.1.6 mistune==3.0.1 nbclient==0.8.0 nbconvert==7.6.0 nbformat==5.9.0 nest-asyncio==1.5.6 notebook_shim==0.2.3 overrides==7.3.1 packaging==23.1 pandocfilters==1.5.0 parso==0.8.3 pexpect==4.8.0 pickleshare==0.7.5 pip-tools==6.13.0 platformdirs==3.8.0 prometheus-client==0.17.0 prompt-toolkit==3.0.38 psutil==5.9.5 ptyprocess==0.7.0 pure-eval==0.2.2 pycparser==2.21 Pygments==2.15.1 pyproject_hooks==1.0.0 pyrsistent==0.19.3 python-dateutil==2.8.2 python-json-logger==2.0.7 PyYAML==6.0 pyzmq==25.1.0 requests==2.31.0 rfc3339-validator==0.1.4 rfc3986-validator==0.1.1 Send2Trash==1.8.2 six==1.16.0 sniffio==1.3.0 soupsieve==2.4.1 stack-data==0.6.2 terminado==0.17.1 tinycss2==1.2.1 tornado==6.3.2 traitlets==5.9.0 typing_extensions==4.7.0 uri-template==1.3.0 urllib3==2.0.3 wcwidth==0.2.6 webcolors==1.13 webencodings==0.5.1 websocket-client==1.6.1 y-py==0.6.0 ypy-websocket==0.12.1 ```
fperez commented 1 year ago

Thanks @minrk! That is consistent with the fact that the offset got worse on each row I edited, and it would also explain why I couldn't reproduce it on Wednesday: I was using the data from the file I copied above in the mardkown description of the issue, instead of the actual original file I had the problem with, which was sent by my bank.

I'll edit the issue description to clarify that it's about CRLF-terminated lines in files, not about CSVs.

@JasonWeill - this should make it a lot easier to track this down! It's an actual data corruption bug, but it should also be fairly straightforward to replicate, and hopefully to fix :)

fperez commented 1 year ago

And this means it probably indeed is related to #14752, as pointed by @JasonWeill...

echarles commented 1 year ago

This title of this issue still contains ...when collaborative mode is on. Does it happen only when RTC is enabled, or always (in which case the title just be updated)?

JasonWeill commented 1 year ago

14752 mentions JupyterLab 4 without a specific mention of RTC being on. I asked for clarification, but by default, RTC is off, so this might apply all the time.

minrk commented 1 year ago

I re-ran the same sequence without collaborative mode and the edits were correct, no misalignment, so at least for this particular issue, it does seem specific to collaboration.

fperez commented 1 year ago

I can reproduce the problem even without passing the --collaborative flag, though with the jupyter-collaboration package installed.

That's actually something I'd noticed a few days ago and meant to report as a separate issue: I am getting the impression that the mere presence of jupyter-collaboration in the environment is enough to activate RTC, even without the explicit flag. That is not what the docs say and I'd consider it a bug, as it makes it impossible to disable RTC without doing a full package uninstall. I could be wrong, but some quick testing seemed to indicate that, and if that's the case, the OP in #14752 might be having that experience, if they have the package installed.

JasonWeill commented 1 year ago

Per comments on #14752, this bug is not necessarily associated with RTC mode. Retitled.

minrk commented 1 year ago

Yes, to be clear I removed jupyter-collaboration to run without collaborative mode. When run with collaboration installed without collaboration explicitly enabled, it still produces the problem. But also when collaboration is installed and not explicitly enabled, the extension is still loaded, the logs contain:

[I 2023-07-01 09:00:52.307 minrk] jupyter_collaboration | extension was successfully loaded.

and the UI has the collaborators tab enabled.

I believe this is because installing jupyter-collaboration enables collaborative mode without a separate opt-in because of this config, so the --LabApp.collaborative flag may be a no-op now.

minrk commented 1 year ago

Also because #14752 mentions browser differences, I'm using Safari on mac

echarles commented 1 year ago

I re-ran the same sequence without collaborative mode and the edits were correct, no misalignment, so at least for this particular issue, it does seem specific to collaboration.

I have gone through the reproducer with chrome on Mac OS and confim the issue arise with RTC, but does not happen without RTC.

I am getting the impression that the mere presence of jupyter-collaboration in the environment is enough to activate RTC, even without the explicit flag.

Same behaviour here I noticed since a few days. I have opened https://github.com/jupyterlab/jupyterlab/issues/14774

Per comments on https://github.com/jupyterlab/jupyterlab/issues/14752, this bug is not necessarily associated with RTC mode. Retitled.

@JasonWeill I think from @minrk previous comment and also from my test that the issue only happens with RTC enable. The title change should be reverted IMHO

echarles commented 1 year ago

Windows should be removed from the title, I reproduce on macOS.

JasonWeill commented 1 year ago

The error concerns the line endings as used by Windows (CRLF), so I'll leave that part unchanged, but I can reinstate the mention of RTC. Thanks for the update.