sgsinclair / Voyant

GNU General Public License v3.0
208 stars 53 forks source link

Need help for spliting corpus in multiple documents with XPATH #518

Closed wilcar closed 3 years ago

wilcar commented 3 years ago

I have press corpus with multiples articles from different newspapers that I consider as authors. I want to perform a text mining by understanding the different authors. I have an XML file and I am a beginner : can you help to complete the importation options ? Thank you for helping

image

Here the begining of my xml file :

  <?xml version="1.0" encoding="UTF-8"?>
      <root encoding="UTF-8">
        <record>
          <content>
      EVENEMENT, jeudi 12 mars 1998 555 mots, p. 4&#13;
      "Le plus complexe, c'est l'information du malade". Un médecin réanimateur a mené une&#13;
      enquête sur les attentes des patients.&#13;                                                      
         </content>
          <author>Libération</author>
          <dates>jeudi 12 mars 1998</dates>
          <publication_date>1998-03-12</publication_date>
          <longueur>5129</longueur>
        </record>
    </root>
ajmacdonald commented 3 years ago

Try the following XPATHs: contenu: //contents auteur: //author documents: //record date de publication: //publication_date

wilcar commented 3 years ago

Thank you for helping. It works great.