Closed vshourie-asu closed 9 months ago
Can I get the full stack trace on the first error, so that I know which function it might come from? :)
If there's no GDPR issue it would also be useful to know what data you used and what hyperparameters you supplied to the model.
Hello, thanks for the response. :)
Can I get the full stack trace on the first error, so that I know which function it might come from? :)
Absolutely. Here you go:
ValueError Traceback (most recent call last)
Cell In[10], line 1
----> 1 topicwizard.visualize(vectorizer=vectorizer, topic_model=dmm, corpus=corpus_cleaned, port=8080)
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\app.py:245, in visualize(corpus, vectorizer, topic_model, pipeline, document_names, topic_names, port, enable_notebook)
242 (_, vectorizer), (_, topic_model) = pipeline.steps
244 print("Preprocessing")
--> 245 app = get_dash_app(
246 vectorizer=vectorizer,
247 topic_model=topic_model,
248 corpus=corpus,
249 document_names=document_names,
250 topic_names=topic_names,
251 )
252 return run_app(app, port=port)
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\app.py:73, in get_dash_app(vectorizer, topic_model, corpus, document_names, topic_names)
42 def get_dash_app(
43 vectorizer: Any,
44 topic_model: Any,
(...)
47 topic_names: Optional[List[str]] = None,
48 ) -> Dash:
49 """Returns topicwizard Dash application.
50
51 Parameters
(...)
71 Dash application object for topicwizard.
72 """
---> 73 blueprint = get_app_blueprint(
74 vectorizer=vectorizer,
75 topic_model=topic_model,
76 corpus=corpus,
77 document_names=document_names,
78 topic_names=topic_names,
79 )
80 app = Dash(
81 __name__,
82 blueprint=blueprint,
(...)
92 ],
93 )
94 return app
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\app.py:31, in get_app_blueprint(vectorizer, topic_model, corpus, document_names, topic_names)
24 def get_app_blueprint(
25 vectorizer: Any,
26 topic_model: Any,
(...)
29 topic_names: Optional[List[str]] = None,
30 ) -> DashBlueprint:
---> 31 blueprint = prepare_blueprint(
32 vectorizer=vectorizer,
33 topic_model=topic_model,
34 corpus=corpus,
35 document_names=document_names,
36 topic_names=topic_names,
37 create_blueprint=create_blueprint,
38 )
39 return blueprint
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\template.py:31, in prepare_blueprint(vectorizer, topic_model, corpus, create_blueprint, document_names, topic_names)
29 if topic_names is None:
30 topic_names = [f"Topic {i}" for i in range(n_topics)]
---> 31 blueprint = create_blueprint(
32 vocab=vocab,
33 document_term_matrix=document_term_matrix,
34 document_topic_matrix=document_topic_matrix,
35 topic_term_matrix=topic_term_matrix,
36 document_names=document_names,
37 corpus=corpus,
38 vectorizer=vectorizer,
39 topic_model=topic_model,
40 topic_names=topic_names,
41 )
42 return blueprint
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\app.py:35, in create_blueprint(vocab, document_term_matrix, document_topic_matrix, topic_term_matrix, document_names, corpus, vectorizer, topic_model, topic_names)
23 def create_blueprint(
24 vocab: np.ndarray,
25 document_term_matrix: np.ndarray,
(...)
33 ) -> DashBlueprint:
34 # --------[ Collecting blueprints ]--------
---> 35 topic_blueprint = topics.create_blueprint(
36 vocab=vocab,
37 document_term_matrix=document_term_matrix,
38 document_topic_matrix=document_topic_matrix,
39 topic_term_matrix=topic_term_matrix,
40 document_names=document_names,
41 corpus=corpus,
42 vectorizer=vectorizer,
43 topic_model=topic_model,
44 topic_names=topic_names,
45 )
46 documents_blueprint = documents.create_blueprint(
47 vocab=vocab,
48 document_term_matrix=document_term_matrix,
(...)
55 topic_names=topic_names,
56 )
57 words_blueprint = words.create_blueprint(
58 vocab=vocab,
59 document_term_matrix=document_term_matrix,
(...)
66 topic_names=topic_names,
67 )
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\topics.py:65, in create_blueprint(vocab, document_term_matrix, document_topic_matrix, topic_term_matrix, topic_names, **kwargs)
56 (
57 topic_importances,
58 term_importances,
(...)
61 topic_term_matrix, document_term_matrix, document_topic_matrix
62 )
64 # --------[ Collecting blueprints ]--------
---> 65 intertopic_map = create_intertopic_map(
66 topic_positions, topic_importances, topic_names
67 )
68 blueprints = [
69 intertopic_map,
70 relevance_slider,
(...)
74 wordcloud,
75 ]
76 # layouts = [blueprint.layout for blueprint in blueprints]
77
78 # --------[ Creating app blueprint ]--------
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\components\topics\intertopic_map.py:29, in create_intertopic_map(topic_positions, topic_importances, topic_names)
20 x, y = topic_positions
22 intertopic_map = DashBlueprint()
24 intertopic_map.layout = dcc.Graph(
25 id="intertopic_map",
26 responsive=True,
27 config=dict(scrollZoom=True),
28 animate=True,
---> 29 figure=plots.intertopic_map(
30 x=x,
31 y=y,
32 topic_importances=topic_importances,
33 topic_names=topic_names,
34 ),
35 className="flex-1",
36 )
38 intertopic_map.clientside_callback(
39 """
40 function(currentTopic, topicNames, currentPlot) {
(...)
61 prevent_initial_call=True,
62 )
63 return intertopic_map
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\plots\topics.py:18, in intertopic_map(x, y, topic_importances, topic_names)
11 def intertopic_map(
12 x: np.ndarray,
13 y: np.ndarray,
14 topic_importances: np.ndarray,
15 topic_names: List[str],
16 ) -> go.Figure:
17 n_topics = x.shape[0]
---> 18 topic_trace = go.Scatter(
19 x=x,
20 y=y,
21 mode="text+markers",
22 text=topic_names,
23 marker=dict(
24 size=topic_importances,
25 sizemode="area",
26 sizeref=2.0 * max(topic_importances) / (100.0**2),
27 sizemin=4,
28 color="rgb(168,162,158)",
29 ),
30 customdata=np.atleast_2d(np.arange(x.shape[0])).T,
31 )
32 fig = go.Figure([topic_trace])
33 fig.update_layout(
34 clickmode="event",
35 modebar_remove=["lasso2d", "select2d"],
(...)
40 margin=dict(l=0, r=0, b=0, t=0, pad=0),
41 )
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\graph_objs\_scatter.py:3378, in Scatter.__init__(self, arg, alignmentgroup, cliponaxis, connectgaps, customdata, customdatasrc, dx, dy, error_x, error_y, fill, fillcolor, fillpattern, groupnorm, hoverinfo, hoverinfosrc, hoverlabel, hoveron, hovertemplate, hovertemplatesrc, hovertext, hovertextsrc, ids, idssrc, legend, legendgroup, legendgrouptitle, legendrank, legendwidth, line, marker, meta, metasrc, mode, name, offsetgroup, opacity, orientation, selected, selectedpoints, showlegend, stackgaps, stackgroup, stream, text, textfont, textposition, textpositionsrc, textsrc, texttemplate, texttemplatesrc, uid, uirevision, unselected, visible, x, x0, xaxis, xcalendar, xhoverformat, xperiod, xperiod0, xperiodalignment, xsrc, y, y0, yaxis, ycalendar, yhoverformat, yperiod, yperiod0, yperiodalignment, ysrc, **kwargs)
3376 _v = marker if marker is not None else _v
3377 if _v is not None:
-> 3378 self["marker"] = _v
3379 _v = arg.pop("meta", None)
3380 _v = meta if meta is not None else _v
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\basedatatypes.py:4865, in BasePlotlyType.__setitem__(self, prop, value)
4863 # ### Handle compound property ###
4864 if isinstance(validator, CompoundValidator):
-> 4865 self._set_compound_prop(prop, value)
4867 # ### Handle compound array property ###
4868 elif isinstance(validator, (CompoundArrayValidator, BaseDataValidator)):
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\basedatatypes.py:5276, in BasePlotlyType._set_compound_prop(self, prop, val)
5273 # Import value
5274 # ------------
5275 validator = self._get_validator(prop)
-> 5276 val = validator.validate_coerce(val, skip_invalid=self._skip_invalid)
5278 # Save deep copies of current and new states
5279 # ------------------------------------------
5280 curr_val = self._compound_props.get(prop, None)
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\_plotly_utils\basevalidators.py:2475, in CompoundValidator.validate_coerce(self, v, skip_invalid, _validate)
2472 v = self.data_class()
2474 elif isinstance(v, dict):
-> 2475 v = self.data_class(v, skip_invalid=skip_invalid, _validate=_validate)
2477 elif isinstance(v, self.data_class):
2478 # Copy object
2479 v = self.data_class(v)
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\graph_objs\scatter\_marker.py:1674, in Marker.__init__(self, arg, angle, angleref, anglesrc, autocolorscale, cauto, cmax, cmid, cmin, color, coloraxis, colorbar, colorscale, colorsrc, gradient, line, maxdisplayed, opacity, opacitysrc, reversescale, showscale, size, sizemin, sizemode, sizeref, sizesrc, standoff, standoffsrc, symbol, symbolsrc, **kwargs)
1672 _v = size if size is not None else _v
1673 if _v is not None:
-> 1674 self["size"] = _v
1675 _v = arg.pop("sizemin", None)
1676 _v = sizemin if sizemin is not None else _v
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\basedatatypes.py:4873, in BasePlotlyType.__setitem__(self, prop, value)
4869 self._set_array_prop(prop, value)
4871 # ### Handle simple property ###
4872 else:
-> 4873 self._set_prop(prop, value)
4874 else:
4875 # Make sure properties dict is initialized
4876 self._init_props()
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\basedatatypes.py:5217, in BasePlotlyType._set_prop(self, prop, val)
5215 return
5216 else:
-> 5217 raise err
5219 # val is None
5220 # -----------
5221 if val is None:
5222 # Check if we should send null update
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\plotly\basedatatypes.py:5212, in BasePlotlyType._set_prop(self, prop, val)
5209 validator = self._get_validator(prop)
5211 try:
-> 5212 val = validator.validate_coerce(val)
5213 except ValueError as err:
5214 if self._skip_invalid:
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\_plotly_utils\basevalidators.py:777, in NumberValidator.validate_coerce(self, v)
772 v_invalid = np.logical_not(v_valid)
773 some_invalid_els = np.array(v, dtype="object")[v_invalid][
774 :10
775 ].tolist()
--> 777 self.raise_invalid_elements(some_invalid_els)
779 v = v_array # Always numeric numpy array
780 elif self.array_ok and is_simple_array(v):
781 # Check numeric
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\_plotly_utils\basevalidators.py:303, in BaseValidator.raise_invalid_elements(self, invalid_els)
301 def raise_invalid_elements(self, invalid_els):
302 if invalid_els:
--> 303 raise ValueError(
304 """
305 Invalid element(s) received for the '{name}' property of {pname}
306 Invalid elements include: {invalid}
307
308 {valid_clr_desc}""".format(
309 name=self.plotly_name,
310 pname=self.parent_name,
311 invalid=invalid_els[:10],
312 valid_clr_desc=self.description(),
313 )
314 )
ValueError:
Invalid element(s) received for the 'size' property of scatter.marker
Invalid elements include: [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]
The 'size' property is a number and may be specified as:
- An int or float in the interval [0, inf]
- A tuple, list, or one-dimensional numpy array of the above
If there's no GDPR issue it would also be useful to know what data you used and what hyperparameters you supplied to the model.
There are data privacy concerns (FERPA, to be exact). Therefore, it's not a good idea for me to share my dataset.
But here's a bit of domain context:
Hyperparameters for DMM model on Tweetopic:
Thanks for the info, I will try to deliver a fix as quickly as possible, I think you were right in your judgment and it has to be the nans being output by tweetopic. In the meantime you can try to identify which texts are problematic (aka result in nans) and perhaps remove them before you pass the corpus as a list of texts to topicwizard.
I checked your Colab notebook and I think when you try to remove the texts there's some pandas shenanigans going on. I would try:
transformed_corpus = topic_pipeline.transform(corpus)
# Turning it into an array so you can index it with an array
filtered_corpus = np.array(corpus)
# Getting the indices where something is nan
problematic_indices = np.isnan(transformed_corpus).any(axis=1)
# Removing them
filtered_corpus = filtered_corpus[~problematic_indices]
topicwizard.visualize(pipeline=pipeline, corpus=filtered_corpus)
I think this should work fine, I will try to address these issues in the meantime.
I managed to reproduce the error with a custom version of NMF that randomly assigns nans to certain observations.
class RandomNanNMF(NMF):
def transform(self, X):
res = super().transform(X)
n_docs = res.shape[0]
nans = np.random.choice(np.arange(n_docs), size=30, replace=False)
res[nans, :] = np.nan
return res
def fit_transform(self, X, y=None, W=None, H=None):
res = super().fit_transform(X, y, W, H)
n_docs = res.shape[0]
nans = np.random.choice(np.arange(n_docs), size=30, replace=False)
res[nans, :] = np.nan
return res
The solution was to filter out the nan values in the preprocessing step of topicwizard and throw a warning to the user informing them about the removal of these documents.
Fix merged into main, new version built and published to PyPI, you should try installing topicwizard 0.2.6 and run your code again :)
Thank you! I'll give it a shot now.
Can you confirm that the fix worked?
Hi!
Sorry for the long wait time, this query got lost in my massive work email mountain.
I reran the visualization command with version 0.2.6 installed.
I get the following error after running. Note that the UserError shows up, which means your validation is working as intended.
`C:\Users\vshourie\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\template.py:33: UserWarning: 31 documents had nan values in the output of the topic model, these are removed in preprocessing and will not be visible in the app.
warn(
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[8], line 1
----> 1 topicwizard.visualize(vectorizer=vectorizer, topic_model=dmm, corpus=corpus_cleaned, port=8080)
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\app.py:245, in visualize(corpus, vectorizer, topic_model, pipeline, document_names, topic_names, port, enable_notebook)
242 (_, vectorizer), (_, topic_model) = pipeline.steps
244 print("Preprocessing")
--> 245 app = get_dash_app(
246 vectorizer=vectorizer,
247 topic_model=topic_model,
248 corpus=corpus,
249 document_names=document_names,
250 topic_names=topic_names,
251 )
252 return run_app(app, port=port)
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\app.py:73, in get_dash_app(vectorizer, topic_model, corpus, document_names, topic_names)
42 def get_dash_app(
43 vectorizer: Any,
44 topic_model: Any,
(...)
47 topic_names: Optional[List[str]] = None,
48 ) -> Dash:
49 """Returns topicwizard Dash application.
50
51 Parameters
(...)
71 Dash application object for topicwizard.
72 """
---> 73 blueprint = get_app_blueprint(
74 vectorizer=vectorizer,
75 topic_model=topic_model,
76 corpus=corpus,
77 document_names=document_names,
78 topic_names=topic_names,
79 )
80 app = Dash(
81 __name__,
82 blueprint=blueprint,
(...)
92 ],
93 )
94 return app
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\app.py:31, in get_app_blueprint(vectorizer, topic_model, corpus, document_names, topic_names)
24 def get_app_blueprint(
25 vectorizer: Any,
26 topic_model: Any,
(...)
29 topic_names: Optional[List[str]] = None,
30 ) -> DashBlueprint:
---> 31 blueprint = prepare_blueprint(
32 vectorizer=vectorizer,
33 topic_model=topic_model,
34 corpus=corpus,
35 document_names=document_names,
36 topic_names=topic_names,
37 create_blueprint=create_blueprint,
38 )
39 return blueprint
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\template.py:44, in prepare_blueprint(vectorizer, topic_model, corpus, create_blueprint, document_names, topic_names)
42 if topic_names is None:
43 topic_names = [f"Topic {i}" for i in range(n_topics)]
---> 44 blueprint = create_blueprint(
45 vocab=vocab,
46 document_term_matrix=document_term_matrix,
47 document_topic_matrix=document_topic_matrix,
48 topic_term_matrix=topic_term_matrix,
49 document_names=document_names,
50 corpus=corpus,
51 vectorizer=vectorizer,
52 topic_model=topic_model,
53 topic_names=topic_names,
54 )
55 return blueprint
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\app.py:46, in create_blueprint(vocab, document_term_matrix, document_topic_matrix, topic_term_matrix, document_names, corpus, vectorizer, topic_model, topic_names)
23 def create_blueprint(
24 vocab: np.ndarray,
25 document_term_matrix: np.ndarray,
(...)
33 ) -> DashBlueprint:
34 # --------[ Collecting blueprints ]--------
35 topic_blueprint = topics.create_blueprint(
36 vocab=vocab,
37 document_term_matrix=document_term_matrix,
(...)
44 topic_names=topic_names,
45 )
---> 46 documents_blueprint = documents.create_blueprint(
47 vocab=vocab,
48 document_term_matrix=document_term_matrix,
49 document_topic_matrix=document_topic_matrix,
50 topic_term_matrix=topic_term_matrix,
51 document_names=document_names,
52 corpus=corpus,
53 vectorizer=vectorizer,
54 topic_model=topic_model,
55 topic_names=topic_names,
56 )
57 words_blueprint = words.create_blueprint(
58 vocab=vocab,
59 document_term_matrix=document_term_matrix,
(...)
66 topic_names=topic_names,
67 )
68 blueprints = [
69 topic_blueprint,
70 words_blueprint,
71 documents_blueprint,
72 ]
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\blueprints\documents.py:32, in create_blueprint(vocab, document_term_matrix, document_topic_matrix, topic_term_matrix, document_names, corpus, vectorizer, topic_model, **kwargs)
19 def create_blueprint(
20 vocab: np.ndarray,
21 document_term_matrix: np.ndarray,
(...)
29 ) -> DashBlueprint:
30 # --------[ Preparing data ]--------
31 n_topics = topic_term_matrix.shape[0]
---> 32 document_positions = prepare.document_positions(
33 document_term_matrix=document_term_matrix
34 )
35 dominant_topics = prepare.dominant_topic(
36 document_topic_matrix=document_topic_matrix
37 )
38 # Creating unified color scheme
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\topicwizard\prepare\documents.py:47, in document_positions(document_term_matrix)
41 perplexity = np.min((40, n_docs - 1))
42 manifold = umap.UMAP(
43 n_components=2,
44 n_neighbors=perplexity,
45 metric="cosine",
46 )
---> 47 x, y = manifold.fit_transform(document_term_matrix).T
48 return x, y
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\umap\umap_.py:2772, in UMAP.fit_transform(self, X, y)
2742 def fit_transform(self, X, y=None):
2743 """Fit X into an embedded space and return that transformed
2744 output.
2745
(...)
2770 Local radii of data points in the embedding (log-transformed).
2771 """
-> 2772 self.fit(X, y)
2773 if self.transform_mode == "embedding":
2774 if self.output_dens:
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\umap\umap_.py:2516, in UMAP.fit(self, X, y)
2510 nn_metric = self._input_distance_func
2511 if self.knn_dists is None:
2512 (
2513 self._knn_indices,
2514 self._knn_dists,
2515 self._knn_search_index,
-> 2516 ) = nearest_neighbors(
2517 X[index],
2518 self._n_neighbors,
2519 nn_metric,
2520 self._metric_kwds,
2521 self.angular_rp_forest,
2522 random_state,
2523 self.low_memory,
2524 use_pynndescent=True,
2525 n_jobs=self.n_jobs,
2526 verbose=self.verbose,
2527 )
2528 else:
2529 self._knn_indices = self.knn_indices
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\umap\umap_.py:328, in nearest_neighbors(X, n_neighbors, metric, metric_kwds, angular, random_state, low_memory, use_pynndescent, n_jobs, verbose)
325 n_trees = min(64, 5 + int(round((X.shape[0]) ** 0.5 / 20.0)))
326 n_iters = max(5, int(round(np.log2(X.shape[0]))))
--> 328 knn_search_index = NNDescent(
329 X,
330 n_neighbors=n_neighbors,
331 metric=metric,
332 metric_kwds=metric_kwds,
333 random_state=random_state,
334 n_trees=n_trees,
335 n_iters=n_iters,
336 max_candidates=60,
337 low_memory=low_memory,
338 n_jobs=n_jobs,
339 verbose=verbose,
340 compressed=False,
341 )
342 knn_indices, knn_dists = knn_search_index.neighbor_graph
344 if verbose:
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\pynndescent\pynndescent_.py:804, in NNDescent.__init__(self, data, metric, metric_kwds, n_neighbors, n_trees, leaf_size, pruning_degree_multiplier, diversify_prob, n_search_trees, tree_init, init_graph, init_dist, random_state, low_memory, max_candidates, n_iters, delta, n_jobs, compressed, parallel_batch_queries, verbose)
793 print(ts(), "Building RP forest with", str(n_trees), "trees")
794 self._rp_forest = make_forest(
795 data,
796 n_neighbors,
(...)
802 self._angular_trees,
803 )
--> 804 leaf_array = rptree_leaf_array(self._rp_forest)
805 else:
806 self._rp_forest = None
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\pynndescent\rp_trees.py:1097, in rptree_leaf_array(rp_forest)
1095 def rptree_leaf_array(rp_forest):
1096 if len(rp_forest) > 0:
-> 1097 return np.vstack(rptree_leaf_array_parallel(rp_forest))
1098 else:
1099 return np.array([[-1]])
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\pynndescent\rp_trees.py:1089, in rptree_leaf_array_parallel(rp_forest)
1088 def rptree_leaf_array_parallel(rp_forest):
-> 1089 result = joblib.Parallel(n_jobs=-1, require="sharedmem")(
1090 joblib.delayed(get_leaves_from_tree)(rp_tree) for rp_tree in rp_forest
1091 )
1092 return result
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\joblib\parallel.py:1098, in Parallel.__call__(self, iterable)
1095 self._iterating = False
1097 with self._backend.retrieval_context():
-> 1098 self.retrieve()
1099 # Make sure that we get a last message telling us we are done
1100 elapsed_time = time.time() - self._start_time
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\joblib\parallel.py:975, in Parallel.retrieve(self)
973 try:
974 if getattr(self._backend, 'supports_timeout', False):
--> 975 self._output.extend(job.get(timeout=self.timeout))
976 else:
977 self._output.extend(job.get())
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\multiprocessing\pool.py:771, in ApplyResult.get(self, timeout)
769 return self._value
770 else:
--> 771 raise self._value
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\multiprocessing\pool.py:125, in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
123 job, i, func, args, kwds = task
124 try:
--> 125 result = (True, func(*args, **kwds))
126 except Exception as e:
127 if wrap_exception and func is not _helper_reraises_exception:
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\joblib\_parallel_backends.py:620, in SafeFunction.__call__(self, *args, **kwargs)
618 def __call__(self, *args, **kwargs):
619 try:
--> 620 return self.func(*args, **kwargs)
621 except KeyboardInterrupt as e:
622 # We capture the KeyboardInterrupt and reraise it as
623 # something different, as multiprocessing does not
624 # interrupt processing for a KeyboardInterrupt
625 raise WorkerInterrupt() from e
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\joblib\parallel.py:288, in BatchedCalls.__call__(self)
284 def __call__(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]
File ~\AppData\Local\miniconda3\envs\tweetopic\lib\site-packages\joblib\parallel.py:288, in <listcomp>(.0)
284 def __call__(self):
285 # Set the default nested backend to self._backend but do not set the
286 # change the default number of processes to -1
287 with parallel_backend(self._backend, n_jobs=self._n_jobs):
--> 288 return [func(*args, **kwargs)
289 for func, args, kwargs in self.items]
ValueError: cannot assign slice from input of different size`
Hello! I am very impressed with this library as per Marton Kardos's article on Medium.
I attempted to use topicwizard to visualize short-text topic modeling inferences based on a quickly trained tweetopic model. The results of my issues and troubleshooting are located on this hosted Google Colab notebook. Please note you can't run the notebook. I've just published so you can easily view via Google Colab.
Information about my Conda environment:
I can train a topic model in tweetopic with no problems. I can import the topicwizard module with no problem. Once finished training on my tweetopic model, I can infer topic names via
topicwizard.infer_topic_names(pipeline=pipeline)
with no problems.However, when I attempt to run
topicwizard.visualize(vectorizer=vectorizer, topic_model=dmm, corpus=corpus_cleaned, port=8080)
I receive the following error:I troubleshooted and found that when I
.transform(...)
my corpus post-training, I found inferences that contain nans. I dropped those rows so that they don't mess with the elaborate computations the /prepare/<...py> files have in place to easily get the Dash app running. Despite cleaning up the nans, when I run the same .visualize() function above with the further cleaned inferences, I receive the following error tracing back to...tweetopic/lib/site-packages/joblib/parallel.py:288
Further context as to the steps I followed is available on that Google Colab notebook.Could any one help me figure out what is preventing me from getting the Dash app working? Thank you!