Closed drdhaval2785 closed 7 years ago
In the current S3 backup, there is no bucket devoted to just the xml forms of the dictionaries.
More specifically, you want the fetch to be of a zip-compressed version of a particular X.xml, right?
This should be doable. Will put on todo list.
Yes, zip or tar.gz anything compressed will do.
Have added an xml file to the S3 backup regimen that is part of each dictionary update.
Since only acc has been updated since this regimen was added, only the acc dictionary has such a file.
Here is a script to download the xml file for a dictionary:
# shell script takes a single argument, a dictionary code
# convert shell script argument to lower case
if [ ! $1 ]; then
echo "script requires a dictionary code as parameter"
echo "Usage: sh xmldownload.sh <dictcode>"
echo "<dictcode> must be one of the dictionary codes"
echo "see http://www.sanskrit-lexicon.uni-koeln.de/"
exit 1
fi
DICT=`echo $1 | tr '[:upper:]' '[:lower:]'`
echo "downloading "$DICT"_xml.zip ..."
curl -o "$DICT"_xml.zip http://s3.amazonaws.com/sanskrit-lexicon/blobs/"$DICT"_xml.zip
Assume this script is named xmldownload.sh
.
Usage example: sh xmldownload.sh acc
This results in a download of 'acc_xml.zip` from S3.
acc_xml.zip , when unzipped, has the following structure:
Maybe the script should do the renaming
Makes sense.
Have added an xml file to the S3 backup regimen
Quick one.
Since only acc has been updated since this regimen was added, only the acc dictionary has such a file.
As most of our scripts are indempont, can you please run a (potentially empty) update on all dicts, so that xmls as of now become available for all dicts?
I am asking this because the last time I did update of my local copies of Cologne dicts was one year back. So need to get fresh copies.
Have generated all the xxx_xml.zip files.
Note: total size of all is about 110MB.
'indempont' did you mean 'idempotent' ?
On 7 Apr 2017 01:30, "funderburkjim" notifications@github.com wrote:
Have generated all the xxx_xml.zip files.
Note: total size of all is about 110MB.
Hurray..
'indempont' did you mean 'idempotent' ?
Idempotent. I had grossly wrong impression of word in my mind. Thanks for correction.
This works well. Closing the issue.
https://github.com/sanskrit-lexicon/cologne-stardict/blob/master/updatexml.sh
# shell script takes a single argument, a dictionary code
# convert shell script argument to lower case
dictList=(acc ae ap ap90 ben bhs bop bor bur cae ccs gra gst ieg inm krm mci md mw mw72 mwe pd pe pgn pui pw pwg sch shs skd snp stc vcp vei wil yat)
for DICT in "${dictList[@]}"
do
echo "downloading "$DICT"_xml.zip ..."
curl -o input/zips/"$DICT"_xml.zip http://s3.amazonaws.com/sanskrit-lexicon/blobs/"$DICT"_xml.zip
done
cd input/extracted
for DICT in "${dictList[@]}"
do
echo "unzipping "$DICT"_xml.zip ..."
unzip -o ../zips/"$DICT"_xml.zip
done
This is the code which works for me as of now.
Missing header files in the following dictionaries.
AP BHS BOP BOR CAE CCS GRA GST IEG INM KRM MCI MW PD PE PGN PUI PW PWG SHS SKD SNP STC VCP VEI WIL YAT
The missing xxxheader.xml is a bug.
The creation of the xxx_xml.zip file assumes that xxxheader.xml is in the pywork directory.
Originally, xxxheader.xml was kept in the downloads directory.
I have moved xxxheader.xml to pywork only in a haphazard way.
Need to write a script to do this systematically.
And then regenerate all the S3 xxx_xml.zip files.
On todo list, high priority.
I have moved xxxheader.xml to pywork only in a haphazard way. Need to write a script to do this systematically.
Ouch, that list is too scary to even look at.
All the xxx_xml.zip S3 backups have been regenerated; all should contain xxxheader.xml files.
I think this issue can be closed.
Nowadays it has become important for me to fetch the latest DICT.xml files of various dictionaries to generate Stardict files from them. Currently I am fetching the data from the amazon server which Jim said some year ago. This fetches whole lot of other stuffs along.
Is it possible to keep some script which can fetch only the latest .xml file and nothing else?
e.g.
sh fetchxml.sh ap
orpython fetchxml.python ap
to fetch latest ap.xml file? Maybe the folder may have two items. Script and a subfolder output where all the fetched files go.Doable @funderburkjim ?