wummel / linkchecker

check links in web documents or full websites
http://wummel.github.io/linkchecker/
GNU General Public License v2.0
1.42k stars 234 forks source link

CSV output malformatted #633

Open diontruter opened 8 years ago

diontruter commented 8 years ago

I tried all possible ways I could think of to have only a CSV file with links and their errors or warnings. I ended up with this command:

linkchecker --no-status --quiet --output=none --file-output=csv/utf_8/www.mydomain.com.csv --anchors http://www.mydomain.com

There are a number of problems.

[1] Progress status is still written to output, and there is a formatting problem due to a trailing double quote. The format of these messages is:

n URLs parsed.";True;

[2] Comment lines are written to output. e.g. # created by LinkChecker at 2016-02-06 16:57:42+002

[3] Status messages are written to output. The format of these messages is:

Redirected to `http://www.domain.com/'.

In order to get a CSV file that can be used I had to do this:

linkchecker --no-status --quiet --output=none --file-output=csv/utf_8/www.mydomain.com.csv --anchors http://www.mydomain.com
# Delete comment lines
sed -i -e '/^# /d' www.mydomain.com.csv
# Delete status lines
sed -i -e '/URLs parsed\.\"\;True\;/d' www.mydomain.com.csv
# Delete redirect notice lines
sed -i -e '/^Redirected to/d' www.mydomain.com.csv
# Add N/A into empty cells
sed -i -e 's/;;/;N\/A;/g' www.mydomain.com.csv

Please my I ask that you consider suppressing output of status messages to CSV when the --no-status is chosen?

dpalic commented 7 years ago

Thank you for the issue report. Sadly this project is dead, and a new team is around with https://github.com/linkcheck/linkchecker for more details please see: #708 Also please close this issue and report it freshly on the new repo https://github.com/linkcheck/linkchecker/issues if your issue still persists