roc-lang / book-of-examples

Software Design by Example in Roc
Other
37 stars 16 forks source link

topic: diff tool #15

Open hristog opened 8 months ago

hristog commented 8 months ago
kdziamura commented 8 months ago

A related ticket: https://github.com/roc-lang/book-of-examples/issues/3

One of the things between backup and version control systems is a diff tool which might be an interesting project on its own.

If the term isn't deemed too hyped up (and as such distract from the actual substance), the chapter could also touch upon the connection to and parallels with blockchain technologies

The beauty of the Merkle tree. I think pattern recognition is a great takeaway from teaching materials. I would suggest having the blockchain as an extra task: “you know how git works, now implement a simple blockchain”. This way it's a high chance the pattern will settle as it demands creativity and contains the eureka effect.

gvwilson commented 8 months ago

I think a diff tool would be a more natural extension to the file backup tool than blockchain - I don't know of any uses of the latter in software development. (One of the themes in the earlier books has been to focus on tools that programmers use for programming.) If you did decide to go with diff, showing how to diff trees (e.g., semantic diff for HTML and SVG) as well as line-oriented diff would be really cool.

Note: tagging as duplicate until the overlap with #3 is resolved.

hristog commented 8 months ago

@gvwilson, indeed Blockchain technology is used as a decentralised database in the fintech sector, but isn't as widespread as version control is.

I'd like to double-check if my understanding has been correct - in your opinion a semantic diff-tool topic would be a better companion to the file backup topic, rather than a chapter on version control? Would a minimalistic version control system, featuring a semantic diff tool be considered as a sufficiently non-overlapping topic?

Thanks!

gvwilson commented 8 months ago

@hristog I could be misunderstanding (please correct me if I am) but if one chapter shows how to use hashes to record unique files plus manifests to record "commits", that's a full hour (see https://third-bit.com/sdxjs/file-backup/ and https://third-bit.com/sdxpy/archive/). Looking at http://gitlet.maryrosecook.com/ and https://shop.jcoglan.com/building-git/, the jump from that to something that supports branching and merging is quite large; is there a subset that you think would fit in our teachable-in-an-hour limit? If so, what features would it include beyond "I can tell which files were backed up at the same time"?

Regarding the diff tool (line-oriented or semantic), I think that's a chapter in itself: I don't know what Git uses for calculating diffs, but the algorithms in Python's difflib are complex enough to take an hour to explore.

So now that I've written this, yes, I think I was misunderstanding - my apologies for that. If there's an extension to file backup that will show readers a bit more about how hash-based version control systems work and will fit in an hour, that would be great; separately, I think there's scope for a chapter on diffing, because it will also take an hour, and I think it might be a prerequisite for the mini-version control system (because I don't know how you'd manage merging without diff, but maybe?).

Cheers - Greg

hristog commented 8 months ago

Hi @gvwilson,

Thank you for clarifying! I believe now my understanding is substantially better (also thanks to your confirmation on the "machine learning from first principles" proposal in #24), and I'll seek to explore both of the proposed directions and will eventually move forward with the proof-of-concept which meets the requirements more closely.

I'll also make sure to update this issue with further information accordingly.

Thank you for your time!

hristog commented 8 months ago

A brief update - based on my local prototypes, indeed a diff tool chapter would fit much better than a minimalistic yet rushed (due to the associated time constraint of an hour) attempt at presenting a meaningful subset of a version control system.

As you've nicely indicated, the narrative would flow quite naturally from the concept of backups through a diff tool chapter. The latter itself could provide as a deliverable an actual tool which could be readily plugged into git and used as a custom diff tool via, e.g.:

git difftool --extcmd=roc-diff ...

or

git difftool --tool=roc-diff ...

with a tiny bit of configuration, required in the latter case (and the configuration itself would optionally also enable for the tool to be able to be set to be the default git diff tool).

Further, the chapter might conclude with a "Where to go from here?" kind of a section, which could discuss possible directions for improvements and also touch upon the concept of a three-way merge, which - as one of the fundamental constituting modules of a version-control system - is based directly on the very diff tool that is the main protagonist of this chapter.

gvwilson commented 8 months ago

:+1: thank you - I'll mark this one down as yours. Can you please create a sub-directory under the project root called "diff" and put your work there along with an index.md file with point-form notes as you go along?

hristog commented 8 months ago

Yes, shall do - thanks!