Closed sebastianruder closed 6 years ago
Thinking out loud:
Assuming that markdown tables can be parsed with something like fsm, we can probably use markdown tables + git logs for plotting and trend spotting.
We could also automate a bot which periodically, say, every 2 weeks - dumps markdown data into more machine readable _data
folder for such usage.
@NirantK It is nowhere near that simple. Turning Markdown tables to YAMLs required a lot of my manual labour (even with some automatization) - various formats, some formatting mistakes, etc.
Also, for converting tables to YAML I wrote this script: https://gist.github.com/stared/ec29b1e8d3c99a6288dcc20d77affc93
It requires some manual inspection, as:
|
)&Author2018
and <<: *Author2018
mappingsThanks for sharing that script @stared ! Some neat hacks there.
I am hoping that if we enforced a markdown table linter of some sort, this would be slightly less tedious to do. I definitely don't claim that it is simple.
To focus on the issue at hand, I am simply asking if the loss in reader (and contributor) ease of access is worth the gain from visualizations?
Yep, a table linter or better enforcement of style guidelines is something we'd definitely want to do.
So far, I haven't really seen any visualizations that added much value beyond what the tables provide. The progress visualizations at AI metrics are nice, but I don't think they're that helpful if a task doesn't have a clear metric of human performance. @stared, do you have any thoughts regarding a "killer visualization" that would clearly warrant using YAML files?
Hey @stared - just following up :)
OK, I know it is a matter of taste. Personally for me YAML files are easier to edit than Markdown tables, and are less error-prone (end certainly simpler than Markdown table + enforcing linter). I admit that for others can have different opinions, depending on the background.
With killer features:
For contributions, I think that the tricky part is to inform where is the
(can be done easily, by adding an automatic link [edit entry in filename]
).
For viewing changes - by pushing to one's own repos, one can see it online.
When it comes to visualizations - true, that for many area (especially if there are only 4 entries or so) it does not provide that much additional information.
While I really like the idea of separating the presentation from the data and storing the data in a dedicated format, the benefits at this point to me seem to be overshadowed by the additional burden placed on the contributor (who might not have used YAML before) and on the reader (who won't be able to view the tables on GitHub).
As at this point the objective should be to get more data (for more tasks and languages) in this repo, these two disadvantages to me outweigh the potential upsides of using YAML.
@sebastianruder should I go ahead and refactor the Hindi and Korean pages to use Markdown?
Yes, let's do that. Thanks!
Ok. So as things stand now, I think it'll be more beneficial to the community to have things in the more readable Markdown format to facilitate reading and contributing. We can think again about converting to YAML if there's a more immediate need in the future.
I'd like to discuss here the pros and cons of using YAML going forward or whether we should stick with Markdown tables. Here are some pros and cons, mainly from @NirantK (in https://github.com/sebastianruder/NLP-progress/pull/116), @stared (in https://github.com/sebastianruder/NLP-progress/issues/43, https://github.com/sebastianruder/NLP-progress/pull/64) and myself.
Pros:
Cons:
Other opinions are welcome.