uclnlp / jack

Jack the Reader
MIT License
257 stars 82 forks source link

[WIP] Draft for makefile commands + docs update; #297 #307 #308. #309

Closed TimDettmers closed 6 years ago

TimDettmers commented 6 years ago

This introduces a draft for download data and preprocessing files with make commands. It would be easy for users to use this, and it would be easy for users to extend this. Currently, it is a mess and it takes ages to understand how to run a simple model. I know Jack and even for me it took ages. This is not good. Let us also discuss different options how to make this better.

Currently the make command for SQuAD is broken, because the processing is not done correctly; see #308.

TimDettmers commented 6 years ago

See also the make list command to list all models. This is more immediate so that users can use the same procedure, make file calls, to (1) find out what they can run, (2) do data download and preprocessing, (3) run simple models. I think this would be a good way to go.

codecov-io commented 6 years ago

Codecov Report

Merging #309 into master will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #309   +/-   ##
=======================================
  Coverage   16.73%   16.73%           
=======================================
  Files          72       72           
  Lines        4254     4254           
=======================================
  Hits          712      712           
  Misses       3542     3542

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 40fbde5...f303853. Read the comment docs.

dirkweissenborn commented 6 years ago

@TimDettmers I like this PR after addressing the minor issues I mentioned and fixing some of the problems you had (e.g., fix download scripts). Is there anything else that is blocking this PR? I think we should get that in soon.

dirkweissenborn commented 6 years ago

I updated readme and the download scripts, should be better now

TimDettmers commented 6 years ago

If we want to use the Makefile as "documentation" is may make sense to include both types of invocation, that is via config and via command line parameters. What do you think?

I will address the changes later today or tomorrow.

dirkweissenborn commented 6 years ago

Whatever you think is better. Personally, I think the using task specific configs with minor overwrites on the command line are nice. That means that the task config gives you a reasonable training setup (hyperparameters) such that you only need to define the model, model_dir etc on the command line.

TimDettmers commented 6 years ago

Hmm, that is true. I like the config + overwrite better. However, we would need to work on something that makes clear to the user what parameters are available and what is their syntax/name. Some models have different parameter options than others. How do we generalize here?

Maybe its better to just list all of them for a start? Just --help parameters or something, and we also have --help models and --help datasets and when you type --help it just returns a list of these commands, what do you think?

undwie commented 6 years ago

Is this good to merge? Maybe we can add a "config introspection" issue for 0.2?

dirkweissenborn commented 6 years ago

@undwie this is not yet ready, see my comments (although partially outdated) above.

TimDettmers commented 6 years ago

I fixed the SNLI commands now, but the SQuAD commands run on errors even after rebase. I think some embedding config value is missing, but this is not clear from the error and not very user-friendly. I also tried the command that you proposed and it also does not because my memory blew up (I think this is GloVe, is it?). Can you fix this @dirkweissenborn?

@undwie we should fix these issues before a release, otherwise people cannot really use jack. If someone that worked on jack (me) cannot use it after 10 minutes of trying to figure out how to run something without a config then this is a pretty bad design. Also, the first release is crucial for long-term success.

I do not think Jack is ready for a release without a fix for these user experience problems.

undwie commented 6 years ago

Note that 0.1 wasn't supposed to be the "let's get it out there release" but a pre-step to that. I am mentioning this because we run the risk of constantly moving goal posts.

But yes, what you say generally true! My point was about ideas like "listing all configuration items automatically" etc. which are great ideas, but I think good examples/readme's can deal with this for the time being. I managed to run FastQA relatively easily by just looking at the configuration files and readme. Maybe that's because I am okay with using config files (if I am explicitly said to use these).

In any case, we should improve usability dramatically, but maybe not all in 0.1? In fact, my wish for 0.2 would be to mostly focus on usability improvements!