jeisner / treebank-scripts

Suite of scripts for preprocessing the Penn Treebank, primarily to extract lexical subcategorization frames and dependencies.
MIT License
7 stars 1 forks source link

Fixed Includes? #18

Closed josephnunn closed 8 years ago

josephnunn commented 8 years ago

online Line 52: @INC=(“.”); removes all -I command line include options, making these scripts unusable from within another script with a different working directory. Alterations to line 1 should keep the intent of line 51 without eliminating the -I command line options. See: http://stackoverflow.com/questions/2526804/how-is-perls-inc-constructed- aka-what-are-all-the-ways-of-affecting-where-pe

jeisner commented 8 years ago

Hi Joseph: thanks for the patch. This may not be portable since I think the parsing of options on a shebang line varies across systems. It's probably safe if you write the shebang line using -I. without whitespace (rather than -I . with whitespace). Alternatively, would it work to do unshift @INC, '.'; or just use '.'; at the start of the script?

josephnunn commented 8 years ago

Hello Jason,

To be honest, I don’t need the shebang line option at all. I call your scripts from another directory using the following format:

perl -I <scriptsdir> <scriptsdir>/oneline <datadir>/*

The problem is setting @INC to ‘.’ wipes out my -I option and without it your scripts that import other scripts won’t find them, this is because this master script is running in a different working directory so setting @INC to ‘.’ fails because ‘.' is the current working directory not your script directory in this case.

I only need to comment line 52 in oneline and everything works fine for me. Perhaps a smarter solution is needed?

I just chose the quick route and thought you should judge.

Joseph

On May 22 , 2016, at 7:59 PM, Jason Eisner notifications@github.com wrote:

Hi Joseph: thanks for the patch. This may not be portable since I think the parsing of options on a shebang line varies across systems. It's probably safe if you write the shebang line using -I. without whitespace (rather than -I . with whitespace). Alternatively, would it work to do unshift @INC https://github.com/INC, '.' or just use '.' at the start of the script?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/jeisner/treebank-scripts/pull/18#issuecomment-220877900

jeisner commented 8 years ago

Well, I think the right approach is to augment @INC with the directory containing the current executable ($0). This seems to be how all the other scripts do it, by calling &stamp (which calls &fixprog).

oneline also calls &stamp, so I'm not sure why it needs to modify @INC as well. I could probably just delete that line as you've done.

In fact, thanks to &stamp, you shouldn't have to invoke the scripts using perl -I. Just invoking /oneline should do it, or putting

on your PATH and invoking oneline. For some reason, stamp declines to modify @INC if you type something like "perl oneline" so that oneline isn't found on your PATH (but it should do the right thing in all other cases, including where "." is on your PATH). There's even a comment about it; I can't see why skipping this case is desirable, so maybe it's an apologetic comment and I should patch stamp as well. Anyway, I can fix and commit. Any further comments before I do?
josephnunn commented 8 years ago

Hello Jason,

I took a look at the $0 variable you mention here http://perldoc.perl.org/perlvar.html http://perldoc.perl.org/perlvar.html but I’m not sure it has your desired effect.

When another script in another directory runs oneline, I believe oneline inherits its working directory instead of using the directory it itself is in. I know for a fact that if I do not supply the ‘-I ’ command line option that your scripts that include other scripts cannot find them. In oneline’s case, here is the message I get when I leave that option off.

Can't locate stamp.inc in @INC (@INC contains: . /Library/Perl/5.18/darwin-thread-multi-2level /Library/Perl/5.18 /Network/Library/Perl/5.18/darwin-thread-multi-2level /Network/Library/Perl/5.18 /Library/Perl/Updates/5.18.2/darwin-thread-multi-2level /Library/Perl/Updates/5.18.2 /System/Library/Perl/5.18/darwin-thread-multi-2level /System/Library/Perl/5.18 /System/Library/Perl/Extras/5.18/darwin-thread-multi-2level /System/Library/Perl/Extras/5.18 .) at jeisner-scripts/oneline line 53.

So you see that @INC already includes ‘.’ but that does not work. You could add a specific location to @INC, but that is not portable. So my master script has a editable variable pointing to , or in my personal case, a symbolic link to it, and I use the -I command line option which adds the supplied directory to Perl’s @INC variable for that invocation. Unfortunately directly setting @INC to ‘.’ overwrites these -I command line options.

For me my approach works fine, I don’t have to change your scripts (excepting commenting that one line), or any part of my local configuration. All the relevant bits are localized to my master script which is part of the project I’m working on with Sean. However I do not know how you generally use your scripts, so I don’t know how you would be affected by deleting or commenting that line.

On looking at the link above, but for @INC instead of $0, the link claims ‘.’ is automatically added to @INC at the end of its construction in every case. So I believe that you can safely delete or comment out that line and everything should work for you as normal even if you rely on @INC having ‘.' for proper functioning.

Joseph

On May 23 , 2016, at 7:00 AM, Jason Eisner notifications@github.com wrote:

Well, I think the right approach is to augment @INC with the directory containing the current executable ($0). This seems to be how all the other scripts do it, by calling &stamp (which calls &fixprog).

oneline also calls &stamp, so I'm not sure why it needs to modify @INC as well. I could probably just delete that line as you've done.

In fact, thanks to &stamp, you shouldn't have to invoke the scripts using perl -I. Just invoking /oneline should do it, or putting

on your PATH and invoking oneline. For some reason, stamp declines to modify @INC if you type something like "perl oneline" so that oneline isn't found on your PATH (but it should do the right thing in all other cases, including where "." is on your PATH). There's even a comment about it; I can't see why skipping this case is desirable, so maybe it's an apologetic comment and I should patch stamp as well. Anyway, I can fix and commit. Any further comments before I do? — You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/jeisner/treebank-scripts/pull/18#issuecomment-220988336
jeisner commented 8 years ago

I've just deleted the line that munges @INC, which I think was probably accidentally left over from an earlier state of the code. The other scripts simply call stamp without doing this.

With this change, I can run oneline fine when running it from another script in another directory. I'm not sure how require('stamp.inc') manages to find stamp.inc in the same directory as oneline. But it does, without using the -I option. Are you sure this doesn't work for you?

josephnunn commented 8 years ago

Hello Jason,

Well I pasted the error I get when I don’t use -I in the last email, if that does’t convince you I don’t know what will :)

It could be the location of the script directory on your machine is included on some perl library path in your environment.

Thanks for making the change!

Joseph

On May 23 , 2016, at 3:49 PM, Jason Eisner notifications@github.com wrote:

I've just deleted the line that munges @INC https://github.com/INC, which I think was probably accidentally left over from an earlier state of the code. The other scripts simply call stamp without doing this.

With this change, I can run oneline fine when running it from another script in another directory. I'm not sure how require('stamp.inc') manages to find stamp.inc in the same directory as oneline. But it does, without using the -I option. Are you sure this doesn't work for you?

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/jeisner/treebank-scripts/pull/18#issuecomment-221119054

jeisner commented 8 years ago

Good guess - you're right! :) stamp.inc has to be a directory listed in the PERL5LIB or PERLLIB variable.
Then you don't need to use the -I flag.
I've added both approaches to the documentation. Thanks again for reporting this.