indic-transliteration / sanscript.js

Transliteration package for Indian scripts
MIT License
98 stars 39 forks source link

Various improvements/refactoring to modernize the project (old) #20

Closed rramphal closed 4 years ago

rramphal commented 4 years ago

All tests continue to pass both when running test/index.html as well as when running npm run test.

psvenk commented 4 years ago

By the way, I had actually kept an .editorconfig file (ignored by Git) while I was working on the other two PRs just so that I didn't need to change my Vim indentation settings every time I opened a file. Thanks for adding an .editorconfig.

vvasuki commented 4 years ago

Outputting the generated file into its own directory makes for a nicer directory hierarchy, but it makes an avoidable breaking change and means that anyone using Sanscript.js via npm will need to change their include path, and the dist directory would also only contain one file.

Thanks for the review, @psvenk ! @rramphal Let's not go for the breaking change mentioned above.

psvenk commented 4 years ago

By the way, another comment I forgot to mention earlier (sorry):

rramphal commented 4 years ago

Thank you @psvenk and @vvasuki for reviewing so quickly!

By the way, I had actually kept an .editorconfig file (ignored by Git) while I was working on the other two PRs just so that I didn't need to change my Vim indentation settings every time I opened a file. Thanks for adding an .editorconfig.

You're very welcome about the EditorConfig. It makes things so much easier, especially across a team. Also, thank you for your improvements to the .editorconfig file!

Outputting the generated file into its own directory makes for a nicer directory hierarchy, but it makes an avoidable breaking change and means that anyone using Sanscript.js via npm will need to change their include path, and the dist directory would also only contain one file.

@rramphal Let's not go for the breaking change mentioned above.

I actually decided to split out a dist folder because I thought that #19 implied that @vvasuki was only concerned that the documentation was updated. I've restored sanscript.js to its original location as requested.

Using let and const affects browser compatibility. 95.56% of browsers do have some support for let (and 97.83% for const), but I am noting this because I am not sure how much of an issue this is for Sanscript.js.

I added a step to let Babel transpile ES6 code to ES5. This is mainly to allow us to use const and let, which has become the best practice in the JavaScript community, but also because we can not target specific environments (browsers, node versions) to ensure compatibility. That said, if @vvasuki isn't worried about the 4.44% of browsers that don't support let and the 2.17% that don't support const, I'd be glad to strip Babel out.

ESLint seems like a little bit overkill, but I see the benefit of having consistent and correct code, we have a build step anyway, and it is only a devDependency.

I agree that ESLint, and now Babel, might be a bit of overkill, but now that we have at least six developers who will have contributed to the codebase, tools like these allow us to use the latest, most powerful features of JavaScript while maintaining backward compatibility and a consistent style. Not to mention that some of the things that ESLint caught while linting this codebase are objective improvements (not reusing function-scoped counter variables, for example). My hope is that moving forward, developers will be able to contribute solid, modern, readable code without having to worry about whitespace style consistency, team conventions, and backward compatibility. Both Babel and ESLint are industry norms, and if we are providing an npm module, we should follow industry best practices. This seems like a good middle-ground to me. At least I'm not suggesting adding webpack and converting the whole repo to TypeScript 😋 !

The output of npm run test is extremely verbose (though there is a summary at the end) — I suspect this is due to the console.log statements. Maybe we should make these only run in the browser or remove these entirely to make the test output easier to read. We could alternatively replace all instances of console.log with log, a function that could be defined to call console.log conditionally (maybe if a variable is set for debugging).

YES — over 16,000 lines of logging! In an effort to debug this, I ended up removing the console.log lines from tests.js themselves and instead opted to added a button to test/index.html called "log schemes to console" for users to manually log the data. I also removed the devanagari-kannada lines because I can't figure out why whose calls are so special. I think they might have been remnants of some debugging. The real culprit was console.debug(map); in src/sanscript.js itself, which I ended up removing. I also added a lint rule to help us keep console.log statements out of the code we check in.

Because Sanscript.js is fairly simple, there probably won't be any need for minification or other things that would cause us to ship more than one file. I'm not opposed to this change in principle, but I don't see the specific benefits here.

If we decide to keep Babel, minification does become pretty easy to implement. I agree that there probably isn't much need for minification, but given that minification is also a best practice, do we want to consider it?

psvenk commented 4 years ago

@rramphal Thanks for the changes! Some comments:

If we decide to keep Babel, minification does become pretty easy to implement. I agree that there probably isn't much need for minification, but given that minification is also a best practice, do we want to consider it?

While I am not opposed to adding ESLint and Babel in this case, I would argue that something being a "best practice" does not necessarily mean that it is a good idea. For example, I gather that usage of JSON for configuration is considered a best practice in the JS world (and I do believe in this particular case that the benefits outweigh the costs) but JSON is not very well suited for configuration. Windows is an "industry standard" in many companies but that is more out of convenience than out of a belief that Windows is architecturally superior to the alternatives.

As I see it, the benefits of Webpack and minification are reducing code size, which is mainly relevant for libraries and web apps with many large dependencies (particularly multi-megabyte bundles, which IMO are usually a bad practice) and not for small and focused libraries like Sanscript (which is 84K transpiled ES5).

I see that there is some merging and that the commit count is now at 26. Could you or I do a rebase to make for a cleaner history? For example, merging master into improvements to bring improvements up to date will make git log output and any history visualization look needlessly convoluted. Some details, such as that the .editorconfig file was revised, that the generated sanscript.js was moved to dist and back again, and that this branch was originally based on an older commit, are IMO not important enough for inclusion in the commit log and make the commit log harder to read — these details are still captured in the pull request discussion (which can be exported). I would be happy to perform the rebase if you prefer.

rramphal commented 4 years ago

... I would argue that something being a "best practice" does not necessarily mean that it is a good idea.

I totally agree. That's what I was trying to get at with: "Both Babel and ESLint are industry norms, and if we are providing an npm module, we should follow industry best practices. This seems like a good middle-ground to me. At least I'm not suggesting adding webpack and converting the whole repo to TypeScript". In this case, I think Babel and ESLint are good additions for what we are trying to get done. Because this project is called sanscript.js and because it is being published as an NPM module, the industry practices I am referencing are those of the target consumers as well as any potential contributors. Another reason that I'd be the first to agree with you is that more than any other tech field, the JavaScript field is changing so fast that we must be prudent before blindly following "best-practices," which is exactly what I think your point is.

As I see it, the benefits of Webpack and minification are reducing code size, which is mainly relevant for libraries and web apps with many large dependencies (particularly multi-megabyte bundles, which IMO are usually a bad practice) and not for small and focused libraries like Sanscript (which is 84K transpiled ES5).

I really could go either way. Once a library is released as a npm package, who knows how it will be composed and with how many other packages. It's a slippery slope to draw the line as to when minification is necessary, and since minification is a very good industry convention, I think we're doing the right thing by at least thinking about it. JS devs are moving towards code-splitting and tree shaking, but not everyone is doing that, and so any little bit helps. I quite like the convention that CDNs have where they offer both uncompressed and minified versions and you can make the choice yourself, which is definitely something we could do here and I would be in support of.

I see that there is some merging and that the commit count is now at 26. Could you or I do a rebase to make for a cleaner history?

It is my general process to commit often and fairly neatly for feature branches, which also helps with code-reviewing large PRs. Then once the PR is ready to be merged, I use GitHub's "Squash and merge" feature which combines the numerous commits into one commit with a rollup of all the commit messages as its singular commit message. I don't use rebasing much because it feels like rewriting history to me, but I most likely have a flawed understanding. I noticed that in #13, you rebased and force-pushed, and I also don't like force-pushing. I use "Squash and merge" regularly and it has been good enough for what you're concerned with. There is also the "Rebase and merge" option which I have never used that might also work? All this to say that I don't think we need to do anything other than pick the right choice under the green button.

Hubmail is pretty cool, by the way!

psvenk commented 4 years ago

It is my general process to commit often and fairly neatly for feature branches, which also helps with code-reviewing large PRs. [...] All this to say that I don't think we need to do anything other than pick the right choice under the green button.

@rramphal Interesting. The workflow I use is to use "squash and merge" when it makes sense to keep everything in one commit, and a regular merge when it does not (in which case there is some value in seeing the commit history). The "rebase and merge" option just rebases the branch and does a fast-forward merge, so there is no delineation saying that the commits come from a certain pull request.

I like rebasing (and force-pushing) because I don't think rewriting history (in feature branches) is necessarily a bad thing. I see the commit log as an organizational tool, and rewriting history is like refactoring in that it makes the commit log easier for oneself and other contributors to read later on (more information at https://git-rebase.io/). I did use "squash and merge" in the other two pull requests (after manually squashing the commits with an interactive rebase).

I think there is value in keeping separate commits in this case because the changes, while individually small, are of many different natures and it would be beneficial to group the diffs by topic. Squashing and merging is one extreme, simplifying the commit history into one commit (internally using the same plumbing commands as git rebase), and merging without rebasing at all is the other extreme, keeping the entire commit history including things that would not be very useful. Most Git commands, including rebase and force push, do not permanently lose data but merely hide the data so that the surface commit log is cleaner. The history remains in the pull request thread but the specific commits are lost to people who were not directly involved in the discussion (but this is a fault of GitHub and not of Git; in the mailing list workflow older versions of patches or patchsets would be accessible in the mailing list archives).

Nonetheless I believe that in this case the benefits of rebasing outweigh the costs. The discussion captures the changes that were made and reverted pretty well.

rramphal commented 4 years ago

@psvenk, let me think about this. In the mean time, did you get a chance to review the new commits since your last review? Is there anything you want me to change code-wise?

rramphal commented 4 years ago

@psvenk, I think I see what you're saying now. I haven't ever rewritten history within a feature branch, but if it isn't a shared branch, there should be no problem with that. It really does boil down to the fact that I'm still unfamiliar with git rebase, but there's no better time to learn than the present!

I agree that there would be value in grouping related commits together thematically while reducing the number of commits. I have learned that merging master into my feature branch is really the old-school way of doing it and I definitely have had annoying bugs come from code changes being introduced via a merge conflict resolution. I have also noticed that we never really lose commits unless we prune them out, so we can still reference them if need be.

With #13, I noticed as a reviewer that I didn't like what happened to the conversation history itself after a rebase and force-push, so rather than do that, I'll branch off of this branch, rebase the new branch, and submit a new PR from the new branch. I'll close this PR and link back to it from the new PR when I'm done. For ease, I'll wait until the current PR is reviewed for code content, and then I'll take this course of action. How does that sound?

psvenk commented 4 years ago

@psvenk, let me think about this. In the mean time, did you get a chance to review the new commits since your last review? Is there anything you want me to change code-wise?

@rramphal Ok, that's fine. I saw the new commits and they look good, but I have two comments:

With #13, I noticed as a reviewer that I didn't like what happened to the conversation history itself after a rebase and force-push, so rather than do that, I'll branch off of this branch, rebase the new branch, and submit a new PR from the new branch. I'll close this PR and link back to it from the new PR when I'm done. For ease, I'll wait until the current PR is reviewed for code content, and then I'll take this course of action. How does that sound?

Ok, that sounds good. When you say that you don't like what happened to the conversation history, do you mean GitHub putting the new commits below all of the existing messages? By the way, for what it's worth I don't think code review comments are discarded upon a force-push.

rramphal commented 4 years ago

I see that path.join(dirname, "..", ...) is used in the build script. We could have const rootdir = path.dirname(dirname); near the beginning of the file and just use path.join(rootdir, ...). This would solve portability issues (if any) with Windows.

I'll make that change now. It'll also dry up our code, which is good.

I noticed that the prepare script is run when I run npm install — from reading the NPM docs this seems like standard behavior, but I just wanted to check that this is intentional and standard.

I did choose the prepare script intentionally, but honestly I don't know if it is standard at all. Do you have any suggestions?

When you say that you don't like what happened to the conversation history, do you mean GitHub putting the new commits below all of the existing messages?

The new commits at the bottom is fine — I think it was the unlinking of the comments to the commits that I didn't like.

rramphal commented 4 years ago

By the way, @psvenk, do you do your rebasing via the terminal or via a GUI? I'd appreciate any suggestions (I'd email you, but you don't have your email listed on GitHub haha)!

psvenk commented 4 years ago

I did choose the prepare script intentionally, but honestly I don't know if it is standard at all. Do you have any suggestions?

It seems that prepare and prepublish both have this behavior, so I would guess that this behavior is standard.

By the way, @psvenk, do you do your rebasing via the terminal or via a GUI? I'd appreciate any suggestions (I'd email you, but you don't have your email listed on GitHub haha)!

I use the terminal for rebasing as with most git operations (though I do use Fugitive.vim sometimes for committing). https://git-rebase.io/ explains rebasing better than I possibly can, but if you have any specific questions I would be happy to clarify for you.

rramphal commented 4 years ago

New PR #22 created, so I'm closing this PR.