jupyter / nbdime

Tools for diffing and merging of Jupyter notebooks.
http://nbdime.readthedocs.io
Other
2.67k stars 160 forks source link

Failure to resolve cleanly when the attachments dict is deleted on one side and edited on the other side #196

Open martinal opened 7 years ago

martinal commented 7 years ago

To reproduce and see why this is a problem:

Checkout the output-diff-improvements branch (at the time of writing in a PR).

Go to nbdime/tests/files.

Try merging these files:

martinal:nbdime/nbdime/tests/files (output-diff-improvements) $ nbmerge -o merged.ipynb attachment.ipynb attachment--change_attachment.ipynb attachment--remove_attachment.ipynb 
[W autoresolve:260] autoresolving conflict at /cells/0/attachments with inline-attachments
[W nbmergeapp:47] Conflicts occured during merge operation.
[I nbmergeapp:60] Merge result written to merged.ipynb

Then these files:

martinal:nbdime/nbdime/tests/files (output-diff-improvements) $ nbmerge -o merged2.ipynb attachment.ipynb attachment--change_attachment.ipynb attachment--empty_attachments_list.ipynb 
[W autoresolve:260] autoresolving conflict at /cells/0/attachments with inline-attachments
[W nbmergeapp:47] Conflicts occured during merge operation.
[I nbmergeapp:60] Merge result written to merged2.ipynb

Then see that the results are different:

martinal:nbdime/nbdime/tests/files (output-diff-improvements) $ nbdiff merged.ipynb merged2.ipynb 
nbdiff merged.ipynb merged2.ipynb
--- merged.ipynb  2016-12-07 13:06:27.400383
+++ merged2.ipynb  2016-12-07 13:06:33.008367
## replaced /cells/0/attachments/test.png/image/png:
-  iVBORw0K...<snip base64, md5=90b54a2ccca65363...>
+  iVBORw0K...<snip base64, md5=7fbdb737b44d1aa8...>

The only difference between the two cases is that one notebook has no attachments field and another has an empty attachment dict:

martinal:nbdime/nbdime/tests/files (output-diff-improvements) $ nbdiff attachment--empty_attachments_list.ipynb attachment--remove_attachment.ipynb 
nbdiff attachment--empty_attachments_list.ipynb attachment--remove_attachment.ipynb
--- attachment--empty_attachments_list.ipynb  2016-12-07 13:05:49.924495
+++ attachment--remove_attachment.ipynb  2016-11-24 13:48:49.432845
## deleted /cells/0/attachments:
-  attachments:
vidartf commented 7 years ago

Would you mind writing up a short description of what inline-attachment is intended to do? Just to make sure we're not trying to treat a symptom instead of the root cause :)

martinal commented 7 years ago

If you add a breakpoint or print in the make_inline_attachments_decisions function you'll see that it's not called. That's the root cause and why I wrote this issue as I did.

However the current implementation of the inline-attachment strategy will deal with a delete vs patch or replace conflict by just picking the modified attachment. This is not covered by unit tests yet but can be tested manually by running nbmerge on the attachment*.ipynb files and using nbshow -a on the input files and merged file to inspect the attachment md5 value.

vidartf commented 7 years ago

I meant on a higher level: How do you expect a final conflict to look after getting a conflict on one or more attachments? How will the user see this?

martinal commented 7 years ago

In the current implementation a delete/patch conflict on an attachment will not be visible, since the modified attachment will be the new value. I'm all ears on ways to display or record that kind of conflict, but that's a separate issue.

Also, since you ask, a patch/patch or equivalent replace/replace conflict will result in new attachments named LOCAL_original.png and REMOTE_original.png. I'm also open to discuss how this should be, but again that's a separate issue and of lesser importance right now.

vidartf commented 7 years ago

If editing the attachment name, then any references in source to that attachment will be broken, meaning e.g. that images will simply show up as broken. As such, I was actually wondering whether all "inline" things should actually not use custom diffs, but instead have an inline action, and then apply the inline value in apply_decisions.

martinal commented 7 years ago

Maybe if we apply strategies in the generic merge, we can create an inline decision right away without splitting up the diffs there into multiple decisions that need bundling later. Then call something like the current make_inline_foo_decisions in apply_decisions.

martinal commented 7 years ago

However, I don't see any way to preserve both local and remote modifications as inline conflicts in the merged notebook without storing them under different names. One approach could be to replace the reference to the attachments in the markdown source with references to both local and remote attachment versions (with the new filenames) such that they will be shown, and place conflict markers in the source:

unchanged markdown text...

<<<<<<< local
![image alt text](attachment://LOCAL_image.png)
=======
![image alt text](attachment://REMOTE_image.png)
>>>>>>> remote

more unchanged markdown text...

Given that we don't have any products using the attachment feature, I suggest we wait and see.

vidartf commented 7 years ago

I'd vote for using the last solution as a decent initial guess.

vidartf commented 7 years ago

Is this still an issue, or was this solved?

martinal commented 7 years ago

Still an issue although the problem is elsewhere.