indic-transliteration / sanscript.js

Transliteration package for Indian scripts
MIT License
98 stars 39 forks source link

Sanscript PHP #2

Closed wellebee closed 11 years ago

wellebee commented 11 years ago

I ported Sanscript and the test set to PHP. The PHP version supports all schemes and passes all of the tests. Are you interested in merging the PHP version into the main repository?

The only (optional) change I made to sanscript.js was to use array literals in the instance variable initializers instead of dynamically created ones created by split(). This change made the port into PHP more natural and didn't seem to weaken the javascript verison.

Thanks.

vvasuki commented 11 years ago

Making sure Arun, the creator of the trunk gets this.

On Tue, Jan 29, 2013 at 6:14 PM, Keith Morgan notifications@github.comwrote:

I ported Sanscript and the test set to PHP. The PHP version supports all schemes and passes all of the tests. Are you interested in merging the PHP version into the main repository?

The only (optional) change I made to sanscript.js was to use array literals in the instance variable initializers instead of dynamically created ones created by split(). This change made the port into PHP more natural and didn't seem to weaken the javascript verison.

Thanks.

You can merge this Pull Request by running

git pull https://github.com/wellebee/sanscript master

Or view, comment on, or merge it at:

https://github.com/sanskrit/sanscript/pull/2 Commit Summary

  • Changed instance variable initializers to JSON-compatible format.
  • Added sanscript.php.
  • Fixed zero in input string considered as empty.
  • Fixed problem with undefined letters.
  • Fixed problem with IAST regexp.
  • Fixed array reference problem in Kolkata scheme.
  • Fixed a problem with literal Unicode initializers.
  • Fixed a problem with ITRANS {\m+} pattern.
  • Cleaned up some style issues.
  • Added php unit tests.

File Changes

  • A ptest/SanscriptBase.php (169)
  • A ptest/SanscriptDravidian.php (42)
  • A ptest/SanscriptITRANS.php (83)
  • A ptest/SanscriptOptions.php (21)
  • A ptest/SanscriptSetup.php (60)
  • A ptest/SanscriptToggle.php (28)
  • A ptest/SanscriptTransliteration.php (163)
  • A ptest/config.xml (14)
  • A ptest/run_tests (4)
  • M sanscript/sanscript.js (294)
  • A sanscript/sanscript.php (727)

Patch Links:

Vishvas /विश्वासः

akprasad commented 11 years ago

Forgive me for my late reply.

Yes, I'm certainly interested in merging the PHP version into the main repo. And now that you mention it, I have a Python version that might be worth merging in as well. But I'm not sure how to organize such a thing. I've tried to find some library available in multiple languages to see how the files are organized, but I haven't found a good one. Until I can find a library like the one described, I'll leave this pull request open.

I'm undecided on the changes to the JS file. Turning the split() statements into lists is certainly clearer and less error-prone. I don't think there's any need to quote the object keys, though.

vvasuki commented 11 years ago

On Fri, Apr 12, 2013 at 10:09 PM, akprasad notifications@github.com wrote:

But I'm not sure how to organize such a thing. I've tried to find some library available in multiple languages to see how the files are organized, but I haven't found a good one.

In case this helps: Back when I worked in a project which used both java and scala code, the code was organized thus: project_name > java project_name > scala

Vishvas /विश्वासः

akprasad commented 11 years ago

All right, I've found a few examples of libraries available in multiple languages. less uses two separate repos, one for Ruby and one for JavaScript. Other projects seem to do something similar. The comments here also incline me toward having separate repos.

I think that's probably the best way to go; it allows the versions to be developed independently, and it's a far sight easier to organize. Granted, using separate repos might allow the libraries to drift apart a little more in terms of functionality or API. But that's better than the alternative of keeping them synchronized at all times, which would be pretty painful.

With that in mind, I'll close this pull request.

Still, these two repos should be linked somehow. I like this approach of creating a meta-repo that links to the separate languages, but it might just be easier to edit the readme or create a wiki page.

If you plan to focus on the PHP implementation, I'd encourage you to rename your repo to sanscript.php or something similar. If you'd like to host it at sanskrit/sanscript.php, let me know. Ordinarily, I would rename this repo to sanscript.js, but that might break something.

wellebee commented 11 years ago

Thanks for your reply. Regarding your comment

I'm undecided on the changes to the JS file. Turning the split() statements into lists is certainly clearer and less error-prone. I don't think there's any need to quote the object keys, though.

Turning the split statements into lists would greatly facility moving bug fixes or enhancements in this section into other languages such as PHP. Note too that some of the keys are quoted already (e.g., '~N', "_M"). One might argue for uniformity's sake they should all be quoted. My purpose in adopting the JSON format was that it is well-understood (and therefore less arbitrary than something I would propose personally) and had the important side-effect of permitting me to use a very simple search-and-replace operation to produce PHP literals :)

In summary, I'd like to see the js literals use the list notation to facilitate future collaboration; I think there's an argument to be made for quoting keys, but I'm not hard over on that.

I note in passing that an older project naming convention uses initial letters (e.g., NUnit, JUnit, PHPUnit) but I would be fine renaming this one to sanscript.php. As you suggest, I think it might be best to host it at sanskrit/sanscript.php to help consolidate efforts, assuming I would retain commit privileges.

akprasad commented 11 years ago

I agree with you on the issues of code style and uniformity. My indecision was due to concerns about the effect that the change in file size would have on serving the script online. When minified, the difference between the current master and your commit is about 1.6KB, but through some research online I've gathered that this difference is essentially insignificant when it comes to serving static files.

In summary, I'm inclined to agree with you and will probably get around to updating the code eventually.

I've created a repo at sanskrit/sanscript.php that you're free to use however you like. And of course, you have commit privileges for it as well. Let me know if you have any issues working with the repo.

wellebee commented 11 years ago

Thanks. I've moved my work to sanskrit/sanscript.php. At some point I'll rework the README to make it PHP specific and to refer to other ports.

Regarding renaming to sanscript.js, I think the risk of breaking someone's work is low. There are only two forks right now, one owned by your associate vvasuki and the other by amit, who has submitted a pull request. A quick message to them to edit their .git/config should result in little inconvenience.

Would you kindly send me your email (keith@wellebee.com) in case I need to reach you? Communicating via a closed pull request can only go so far :)

akprasad commented 11 years ago

Perhaps you're right. And it would be nice to have the project names follow a similar pattern.

Anyway, I've sent you an email. With that, I think this discussion can come to a close.