Open NikantVohra opened 11 years ago
@NikantVohra I'll have to look it up if I still have that. I'd let you know. Meanwhile, have you talked to @ftyers regarding this, i.e. if he still has one?
I'd be more than happy to write one for you if both options are negative. :)
ftyers has provided me with two scripts ...I will try them and get back if I need any help...:)
Hey I am not able to extract wiki data using these scripts:
El dj 04 de 07 de 2013 a les 01:09 -0700, en/na NikantVohra va escriure:
Hey I am not able to extract wiki data using these scripts: http://pastebin.com/ugUYNfC2 http://pastebin.com/LwhJwCnu
can you help with that?
Can you at least say what you tried ?
F.
Are you trying to write/use a web crawler to download pages from wikipedia or are you trying to extract words from already downloaded pages?
The two scripts that Fran gave you will help to extract words from already downloaded pages, but if you need a script to download the pages in the first place, these are not the things you are looking for.
You can download the pages with Curl or any web crawling framework like scrappy (http://doc.scrapy.org/en/latest/intro/tutorial.html). Even writing a very rudimentary level crawler is quite easy with basic python.
If that is the case, I'd try to write you one tomorrow :) I've got my hands full with lot of stuffs, so have a little bit of patience in that case.
Thanks @darthxaher . But it is fine now . Fran gave me the link to the wiki dumps for hindi wiki pages so I do not need to implement the web crawler. I can just extract the data from the dump and use the scripts to form the corpus.I would report back to you once I get the coverage for the dictionaries.
Hey here are my results for the corpus attained from wiki:
http://wiki.apertium.org/wiki/Hindi_and_English/Results
The results are similar for the morphological analyser as the previous corpus but the bilingual dictionary gives a fall of translation accuracy by about 4% .
hi nikant , can u please share the hindi corpus as we require it very urgently.
hi! @NikantVohra can you please share the hindi corpus.
@darthxaher I am trying to create a hindi corpus by crawling wiki in order to get better idea of coverage of dictionaries .Do you have some script to do the same?