Open tattorba87 opened 4 years ago
Or even better:
xmlstarlet sel -t -m '//[local-name()="div"][@type="article"]//[local-name()="p" or local-name()="head"]/text()' -n --var linebreak -n --break -v "translate(., \$linebreak, '')" Annee/.xml | perl -pe 's/^ +//g ; s/^ (.+)/$1\n/g; s/ +/ /g' > est_republicain.txt
Instead of using:
xmllint --xpath '//[local-name()="div"][@type="article"]//[local-name()="p" or local-name()="head"]/text()' Annee/.xml | perl -pe 's/^ +//g ; s/^ (.+)/$1\n/g ; chomp' > est_republicain.txt
this seems to work better:
xmlstarlet sel -t -v '//[local-name()="div"][@type="article"]//[local-name()="p" or local-name()="head"]/text()' Annee/.xml | perl -pe 's/^ +//g ; s/^ (.+)/$1\n/g ; chomp' > est_republicain.txt
As xmllint was replacing several French characters with their hex format. xmlstarlet doesn't seem to have this issue