mohankreddy / crawler4j

Automatically exported from code.google.com/p/crawler4j
0 stars 0 forks source link

download mp3 file as NON binary file #28

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
mp3 file downloaded as a text file, please help to fix it.

i think it because of the code in java file 
edu.uci.ics.crawler4j.crawler.PageFetcher

method public static int fetch(Page page, boolean ignoreIfBinary) {:

Header type = entity.getContentType();
                if (type != null && type.getValue().toLowerCase().contains("image")) {
                    isBinary = true;
                    if (ignoreIfBinary) {
                        return PageFetchStatus.PageIsBinary;
                    }
                }

Original issue reported on code.google.com by wanxiang.xing@gmail.com on 18 Mar 2011 at 10:12

GoogleCodeExporter commented 9 years ago
patch:
    if (type != null && 
                        (type.getValue().toLowerCase().contains("image")
                                ||type.getValue().toLowerCase().contains("audio")
                                ||type.getValue().toLowerCase().contains("video"))) {

Original comment by wanxiang.xing@gmail.com on 18 Mar 2011 at 11:22

Attachments:

GoogleCodeExporter commented 9 years ago
My by this way is better:
        if ( mimeType == null || mimeType.indexOf("text") >= 0 || mimeType.indexOf("xml") >= 0 || mimeType.indexOf("javascript") >= 0 ) {
}else{
 isBinary=true;
}

Original comment by wanxiang.xing@gmail.com on 19 Mar 2011 at 3:44

GoogleCodeExporter commented 9 years ago
My by this way is better:
        if ( mimeType == null || mimeType.indexOf("text") >= 0 || mimeType.indexOf("xml") >= 0 || mimeType.indexOf("javascript") >= 0 ) {
}else{
 isBinary=true;
}

Original comment by wanxiang.xing@gmail.com on 19 Mar 2011 at 3:45

GoogleCodeExporter commented 9 years ago
I have fixed this as you suggested in the svn version.

-Yasser

Original comment by ganjisaffar@gmail.com on 29 Mar 2011 at 2:49

GoogleCodeExporter commented 9 years ago
Tanks Yasser!
to download zip, doc & other things the below is better:

                if (type != null) {
                    String mimeType = type.getValue().toLowerCase();
                      if ( mimeType.indexOf("text") >= 0 || mimeType.indexOf("xml") >= 0 || mimeType.indexOf("javascript") >= 0 ) {
                          //do nothing
                      }else{
                       isBinary=true;
                       if (ignoreIfBinary) {
                            return PageFetchStatus.PageIsBinary;
                        }
                      }
                }

Original comment by wanxiang.xing@gmail.com on 9 Apr 2011 at 5:39