Closed ttang20913 closed 4 years ago
Hi, could you elaborate what specifically is different? The .crx contains the built extension and a pre-trained model.
This is the unzipped version of the chrome-extension folder. The structure is different from the chrome-extension folder on github.
The background.js in two folders are also different.
Yeah, this is the built extension (as opposed to the source you see in the repository). You should get the same file structure if you build it yourself.
OK, thanks for you help.
I have another question. How can I bypass the chrome extension to get the predictions? Suppose I have a html file downloaded. What are the pre-processing steps needed before calling tensorflow's model.predict() function?
This use case is not implemented, but should be easy enough to do.
You should not use the Chrome extension for this. Instead, load the a pre-trained Keras model and vocabulary and pre-process your html files. You should find the necessary pre-processing functions in net/preprocess.py. Then you can just call the model to get your predictions.
from preprocess import * import tensorflow as tf
model = tf.keras.models.load_model('model.049.h5') input = ["test.html"] doc_representation, tags, words = parse(input)
for doc_feature_list, doc_label_list in get_doc_inputs(doc_representation['test.html'],words,tags):
In background.js of the chrome extension, function getInputs() will return a tensor. However, in preprocess.py, function get_doc_inputs(), assuming this is the python version of getInputs(), will return two lists.
So, I am confused on the input format for prediction. Can you clarify the input format that is going to parsed to function model.predict(), eg: data type, vector dimensions? Thanks
You need to call get_feature_vector for each leaf in the document. Check line 119 in that file. That should give you the correct model inputs.
I followed you suggestion and now I can get the feature vector. However, the format is still not matching with that of the model.
The error message is
ValueError: Error when checking input: expected dense_input to have 3 dimensions, but got array with shape (1052, 1)
My implementation:
from preprocess import * import tensorflow as tf
model = tf.keras.models.load_model('net/model.049.h5') input = ["net/test.html"]
############# get tag and words from training data filenames = [] filenames.extend(util.get_filenames("datasets/googletrends/prepared_html/")) data, tags_map, words_map = parse(filenames)
tags_map = get_vocabulary(tags_map, 50) words_map = get_vocabulary(words_map, 1000)
with open("tags_map.pkl","wb") as f1: pickle.dump(tags_map,f1) with open("words_map.pkl","wb") as f2: pickle.dump(words_map,f2)
##############
######## load tag and words
with open("tags_map.pkl","rb") as f1: tags_map = pickle.load(f1) with open("words_map.pkl","rb") as f2: words_map = pickle.load(f2)
########
############## get vector for test.html result, dummy1, dummy2 = parse(input)
for words_dict, tags_dict, label in result["test.html"]: feature_vector = get_feature_vector(words_dict,tags_dict,words_map,tags_map) print(feature_vector) print("prediction:",model.predict(feature_vector))
Hi, please create a new issue for this.
Hi, why the unpacked chrome-extension crx file is different from the chrome extension folder provided on github?