what is the form of Wikipedia ? xml or json or text?

vered1986 / HypeNET

Integrated path-based and distributional method for hypernymy detection

Other

85 stars 13 forks source link

what is the form of Wikipedia ? xml or json or text? #3

Closed chendi1995 closed 6 years ago

chendi1995 commented 6 years ago

thanks. I just use the xml but it failed. it says "ValueError: Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: nlp.add_pipe(nlp.create_pipe('sentencizer')) Alternatively, add the dependency parser, or set sentence boundaries by setting doc[i].sent_start "

vered1986 commented 6 years ago

Thanks for pointing this out! I didn't realize it's not described here: you first need to convert the XML to text using either this or this.