Open beepsoft opened 3 years ago
@dlazesz Balázs, I guess .fig
format may be thrown out as only few :) people is eager to work with it.\
Have you any idea about a convenient format which is suitable for shared work?
@sassbalint thanks for picking up this issue!
@dlazesz Balázs, I guess
.fig
format may be thrown out as only few :) people is eager to work with it. Have you any idea about a convenient format which is suitable for shared work?
You mean for replacing emtsv_modules.pdf
or what would this .fig
would be used for? Unfortunately I have no idea about this.
For the record. The FIG is meant to be edited and then converted to the PDF. Bálint (@sassbalint) used to maintain the FIG.
As both Bálint and I have been left the project. I proposed that Noémi (@vadno) could do a one-time rewrite in Tikz to enable it for others to edit it more conveniently in the future as new modules emerge. I do not want to speak on her behalf.
I have no other ideas how it would be easier for everybody to maintain the figure or who would actually do it in the first place. All ideas, suggestions and applications for maintaining are welcome!
@beepsoft You could send PRs on the documentation (or any part of the project) if you have any ideas how to improve it.
@dlazesz Balázs, could you draw (by hand!) a figure on the current state of the system?\
If yes, we could talk about it on zoom and then I will create a new version (in .fig
...).
@beepsoft .fig
is to be edited by xfig
which is an old but very good quality piece of software, I think.
As @dlazesz mentioned, I'll draw a tikz version of the figure. @sassbalint, xfig is great, but for me tikzpicture is a bit easier to use. I try to do it asap... OK?
As @dlazesz mentioned, I'll draw a tikz version of the figure. xfig is great, but for me tikzpicture is a bit easier to use. I try to do it asap... OK?
Thank you, @vadno Noémi. :)
While, as Balázs put it, "one-time rewrite in Tikz to enable it for others to edit it more conveniently in the future" sounds good, I guess that there is a chance that by creating the Tikz version you just take over this task for a long time, in practice. Are you OK with this? :)
@sassbalint No, I'm not OK with this :) I try to write it as clear as possible, hoping that later others can extend it without my help. But of course I help if needed ;)
UPDATE: Thanks to @vadno the new module figure design has been commited: https://github.com/nytud/emtsv/blob/master/docs/emtsv_modules.pdf
Hope it can handle better the growing number of modules. We plan to restructure and maybe split the figure as more input-output modules are planned in the near future.
I keep this issue open as the current update does not solve the OP just tries to ease the situation. More documentation is on its way.
emtsv is a really great tool, thanks for your work!
I'm all new to NLP so maybe that's the reason for all my problems, but only reading the documentation it is rather difficult to work effectively with
emtsv
One main thing I miss from the documentation is what each module's input and output is:
https://github.com/dlt-rilmta/emtsv#modules
For example, if I want to use the
chunk
module I don't know what data it needs so that it can run.Starting naively like this:
... I get this error:
That's fine, but which module will generate
'form', 'xpostag'
? After some trial and errors I could figure out that I needtok,morph,pos,chunk
, but this is a tedious way to find it out.The topology description is somewhat helpful (https://github.com/dlt-rilmta/emtsv/blob/master/docs/emtsv_modules.pdf) but it uses the "package names" instead of the module names expected by
emtsv
. Eg. it containsemToken
while inemtsv
it needs to be referenced astok
.It would also be great to know what each column in the result actually means and how these columns should be interpreted. This is also something really difficult to find out even after reading a lot of publication related to
emtsv
ande-magyar
.So, a nice documentation structure for someone just getting started with
emtsv
would be something like this:emtsv
(tok, morph, etc)form
,anas
,xpostag
, etc.)1-2. is already available, 3. and 4. is what I am missing.